Overview

Dataset statistics

Number of variables22
Number of observations45451
Missing cells94959
Missing cells (%)9.5%
Duplicate rows50
Duplicate rows (%)0.1%
Total size in memory7.6 MiB
Average record size in memory176.0 B

Variable types

Text11
Numeric9
DateTime1
Categorical1

Alerts

Dataset has 50 (0.1%) duplicate rowsDuplicates
budget is highly overall correlated with revenue and 1 other fieldsHigh correlation
popularity is highly overall correlated with vote_countHigh correlation
revenue is highly overall correlated with budget and 2 other fieldsHigh correlation
vote_count is highly overall correlated with popularity and 1 other fieldsHigh correlation
return is highly overall correlated with budget and 1 other fieldsHigh correlation
status is highly imbalanced (97.0%)Imbalance
belongs_to_collection has 41061 (90.3%) missing valuesMissing
genres has 2384 (5.2%) missing valuesMissing
overview has 941 (2.1%) missing valuesMissing
production_companies has 11875 (26.1%) missing valuesMissing
production_countries has 6220 (13.7%) missing valuesMissing
spoken_languages has 3995 (8.8%) missing valuesMissing
tagline has 25026 (55.1%) missing valuesMissing
cast has 2364 (5.2%) missing valuesMissing
crew has 756 (1.7%) missing valuesMissing
popularity is highly skewed (γ1 = 29.21395045)Skewed
return is highly skewed (γ1 = 138.4438078)Skewed
budget has 36543 (80.4%) zerosZeros
revenue has 38024 (83.7%) zerosZeros
runtime has 1535 (3.4%) zerosZeros
vote_average has 2953 (6.5%) zerosZeros
vote_count has 2855 (6.3%) zerosZeros
return has 40058 (88.1%) zerosZeros

Reproduction

Analysis started2023-06-13 01:39:41.093456
Analysis finished2023-06-13 01:40:38.474153
Duration57.38 seconds
Software versionydata-profiling vv4.2.0
Download configurationconfig.json

Variables

Distinct1644
Distinct (%)37.4%
Missing41061
Missing (%)90.3%
Memory size355.2 KiB
2023-06-12T20:40:39.122299image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length54
Median length43
Mean length23.752392
Min length3

Characters and Unicode

Total characters104273
Distinct characters164
Distinct categories12 ?
Distinct scripts7 ?
Distinct blocks8 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique374 ?
Unique (%)8.5%

Sample

1st rowToy Story Collection
2nd rowGrumpy Old Men Collection
3rd rowFather of the Bride Collection
4th rowJames Bond Collection
5th rowBalto Collection
ValueCountFrequency (%)
collection 3656
25.3%
the 1139
 
7.9%
of 229
 
1.6%
series 146
 
1.0%
133
 
0.9%
and 84
 
0.6%
trilogy 82
 
0.6%
man 60
 
0.4%
a 60
 
0.4%
in 56
 
0.4%
Other values (2316) 8779
60.9%
2023-06-12T20:40:40.247697image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
o 10830
 
10.4%
e 10223
 
9.8%
10035
 
9.6%
l 9928
 
9.5%
i 7356
 
7.1%
n 7215
 
6.9%
t 6329
 
6.1%
c 4737
 
4.5%
C 4363
 
4.2%
a 4330
 
4.2%
Other values (154) 28927
27.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 79037
75.8%
Uppercase Letter 13554
 
13.0%
Space Separator 10035
 
9.6%
Other Punctuation 459
 
0.4%
Close Punctuation 333
 
0.3%
Open Punctuation 333
 
0.3%
Decimal Number 321
 
0.3%
Dash Punctuation 150
 
0.1%
Other Letter 37
 
< 0.1%
Final Punctuation 9
 
< 0.1%
Other values (2) 5
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 10830
13.7%
e 10223
12.9%
l 9928
12.6%
i 7356
9.3%
n 7215
9.1%
t 6329
8.0%
c 4737
6.0%
a 4330
 
5.5%
r 3796
 
4.8%
s 2454
 
3.1%
Other values (68) 11839
15.0%
Uppercase Letter
ValueCountFrequency (%)
C 4363
32.2%
T 1502
 
11.1%
S 1052
 
7.8%
B 667
 
4.9%
M 608
 
4.5%
D 499
 
3.7%
A 490
 
3.6%
H 447
 
3.3%
P 419
 
3.1%
G 412
 
3.0%
Other values (33) 3095
22.8%
Other Letter
ValueCountFrequency (%)
3
 
8.1%
3
 
8.1%
3
 
8.1%
3
 
8.1%
3
 
8.1%
3
 
8.1%
3
 
8.1%
3
 
8.1%
3
 
8.1%
2
 
5.4%
Other values (4) 8
21.6%
Decimal Number
ValueCountFrequency (%)
1 80
24.9%
9 64
19.9%
3 54
16.8%
0 51
15.9%
2 21
 
6.5%
8 13
 
4.0%
5 12
 
3.7%
7 11
 
3.4%
6 10
 
3.1%
4 5
 
1.6%
Other Punctuation
ValueCountFrequency (%)
. 168
36.6%
: 99
21.6%
, 76
16.6%
& 50
 
10.9%
! 34
 
7.4%
/ 21
 
4.6%
* 4
 
0.9%
? 4
 
0.9%
3
 
0.7%
Close Punctuation
ValueCountFrequency (%)
) 328
98.5%
] 5
 
1.5%
Open Punctuation
ValueCountFrequency (%)
( 328
98.5%
[ 5
 
1.5%
Dash Punctuation
ValueCountFrequency (%)
- 148
98.7%
2
 
1.3%
Space Separator
ValueCountFrequency (%)
10035
100.0%
Final Punctuation
ValueCountFrequency (%)
9
100.0%
Modifier Letter
ValueCountFrequency (%)
3
100.0%
Other Number
ValueCountFrequency (%)
½ 2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 92177
88.4%
Common 11645
 
11.2%
Cyrillic 414
 
0.4%
Hiragana 15
 
< 0.1%
Hangul 10
 
< 0.1%
Katakana 9
 
< 0.1%
Han 3
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 10830
11.7%
e 10223
11.1%
l 9928
10.8%
i 7356
 
8.0%
n 7215
 
7.8%
t 6329
 
6.9%
c 4737
 
5.1%
C 4363
 
4.7%
a 4330
 
4.7%
r 3796
 
4.1%
Other values (69) 23070
25.0%
Cyrillic
ValueCountFrequency (%)
л 48
 
11.6%
и 41
 
9.9%
о 37
 
8.9%
к 30
 
7.2%
е 27
 
6.5%
я 25
 
6.0%
а 17
 
4.1%
ц 16
 
3.9%
К 16
 
3.9%
р 14
 
3.4%
Other values (32) 143
34.5%
Common
ValueCountFrequency (%)
10035
86.2%
) 328
 
2.8%
( 328
 
2.8%
. 168
 
1.4%
- 148
 
1.3%
: 99
 
0.9%
1 80
 
0.7%
, 76
 
0.7%
9 64
 
0.5%
3 54
 
0.5%
Other values (19) 265
 
2.3%
Hiragana
ValueCountFrequency (%)
3
20.0%
3
20.0%
3
20.0%
3
20.0%
3
20.0%
Hangul
ValueCountFrequency (%)
2
20.0%
2
20.0%
2
20.0%
2
20.0%
2
20.0%
Katakana
ValueCountFrequency (%)
3
33.3%
3
33.3%
3
33.3%
Han
ValueCountFrequency (%)
3
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 103559
99.3%
Cyrillic 414
 
0.4%
None 246
 
0.2%
Hiragana 15
 
< 0.1%
Punctuation 14
 
< 0.1%
Katakana 12
 
< 0.1%
Hangul 10
 
< 0.1%
CJK 3
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 10830
 
10.5%
e 10223
 
9.9%
10035
 
9.7%
l 9928
 
9.6%
i 7356
 
7.1%
n 7215
 
7.0%
t 6329
 
6.1%
c 4737
 
4.6%
C 4363
 
4.2%
a 4330
 
4.2%
Other values (66) 28213
27.2%
None
ValueCountFrequency (%)
é 49
19.9%
ä 38
15.4%
ô 35
14.2%
ò 28
11.4%
ö 19
 
7.7%
ı 14
 
5.7%
ó 14
 
5.7%
í 9
 
3.7%
á 4
 
1.6%
İ 4
 
1.6%
Other values (18) 32
13.0%
Cyrillic
ValueCountFrequency (%)
л 48
 
11.6%
и 41
 
9.9%
о 37
 
8.9%
к 30
 
7.2%
е 27
 
6.5%
я 25
 
6.0%
а 17
 
4.1%
ц 16
 
3.9%
К 16
 
3.9%
р 14
 
3.4%
Other values (32) 143
34.5%
Punctuation
ValueCountFrequency (%)
9
64.3%
3
 
21.4%
2
 
14.3%
Hiragana
ValueCountFrequency (%)
3
20.0%
3
20.0%
3
20.0%
3
20.0%
3
20.0%
CJK
ValueCountFrequency (%)
3
100.0%
Katakana
ValueCountFrequency (%)
3
25.0%
3
25.0%
3
25.0%
3
25.0%
Hangul
ValueCountFrequency (%)
2
20.0%
2
20.0%
2
20.0%
2
20.0%
2
20.0%

budget
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct1223
Distinct (%)2.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4231294.1
Minimum0
Maximum3.8 × 108
Zeros36543
Zeros (%)80.4%
Negative0
Negative (%)0.0%
Memory size355.2 KiB
2023-06-12T20:40:40.622893image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile25000000
Maximum3.8 × 108
Range3.8 × 108
Interquartile range (IQR)0

Descriptive statistics

Standard deviation17429415
Coefficient of variation (CV)4.119169
Kurtosis66.689604
Mean4231294.1
Median Absolute Deviation (MAD)0
Skewness7.1202048
Sum1.9231655 × 1011
Variance3.0378452 × 1014
MonotonicityNot monotonic
2023-06-12T20:40:40.983753image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 36543
80.4%
5000000 286
 
0.6%
10000000 261
 
0.6%
20000000 243
 
0.5%
2000000 242
 
0.5%
15000000 226
 
0.5%
3000000 223
 
0.5%
25000000 206
 
0.5%
1000000 197
 
0.4%
30000000 192
 
0.4%
Other values (1213) 6832
 
15.0%
ValueCountFrequency (%)
0 36543
80.4%
1 25
 
0.1%
2 14
 
< 0.1%
3 9
 
< 0.1%
4 10
 
< 0.1%
5 8
 
< 0.1%
6 5
 
< 0.1%
7 4
 
< 0.1%
8 5
 
< 0.1%
9 1
 
< 0.1%
ValueCountFrequency (%)
380000000 1
 
< 0.1%
300000000 1
 
< 0.1%
280000000 1
 
< 0.1%
270000000 1
 
< 0.1%
260000000 3
 
< 0.1%
258000000 1
 
< 0.1%
255000000 1
 
< 0.1%
250000000 10
< 0.1%
245000000 2
 
< 0.1%
237000000 1
 
< 0.1%

genres
Text

Distinct4064
Distinct (%)9.4%
Missing2384
Missing (%)5.2%
Memory size355.2 KiB
2023-06-12T20:40:41.340199image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length80
Median length65
Mean length16.470639
Min length3

Characters and Unicode

Total characters709341
Distinct characters30
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2363 ?
Unique (%)5.5%

Sample

1st rowAnimation, Comedy, Family
2nd rowAdventure, Fantasy, Family
3rd rowRomance, Comedy
4th rowComedy, Drama, Romance
5th rowComedy
ValueCountFrequency (%)
drama 20302
21.4%
comedy 13195
13.9%
thriller 7635
 
8.0%
romance 6744
 
7.1%
action 6603
 
6.9%
horror 4676
 
4.9%
crime 4312
 
4.5%
documentary 3926
 
4.1%
adventure 3506
 
3.7%
science 3054
 
3.2%
Other values (12) 21093
22.2%
2023-06-12T20:40:42.118009image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
r 69209
 
9.8%
a 61959
 
8.7%
e 55885
 
7.9%
m 53195
 
7.5%
51979
 
7.3%
o 48608
 
6.9%
, 48157
 
6.8%
i 39762
 
5.6%
n 35762
 
5.0%
y 28562
 
4.0%
Other values (20) 216263
30.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 513391
72.4%
Uppercase Letter 95814
 
13.5%
Space Separator 51979
 
7.3%
Other Punctuation 48157
 
6.8%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
r 69209
13.5%
a 61959
12.1%
e 55885
10.9%
m 53195
10.4%
o 48608
9.5%
i 39762
7.7%
n 35762
7.0%
y 28562
5.6%
c 28035
5.5%
t 26266
 
5.1%
Other values (7) 66148
12.9%
Uppercase Letter
ValueCountFrequency (%)
D 24228
25.3%
C 17507
18.3%
A 12051
12.6%
F 9777
10.2%
T 8403
 
8.8%
R 6744
 
7.0%
H 6073
 
6.3%
M 4842
 
5.1%
S 3054
 
3.2%
W 2367
 
2.5%
Space Separator
ValueCountFrequency (%)
51979
100.0%
Other Punctuation
ValueCountFrequency (%)
, 48157
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 609205
85.9%
Common 100136
 
14.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
r 69209
11.4%
a 61959
 
10.2%
e 55885
 
9.2%
m 53195
 
8.7%
o 48608
 
8.0%
i 39762
 
6.5%
n 35762
 
5.9%
y 28562
 
4.7%
c 28035
 
4.6%
t 26266
 
4.3%
Other values (18) 161962
26.6%
Common
ValueCountFrequency (%)
51979
51.9%
, 48157
48.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII 709341
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
r 69209
 
9.8%
a 61959
 
8.7%
e 55885
 
7.9%
m 53195
 
7.5%
51979
 
7.3%
o 48608
 
6.9%
, 48157
 
6.8%
i 39762
 
5.6%
n 35762
 
5.0%
y 28562
 
4.0%
Other values (20) 216263
30.5%

id
Real number (ℝ)

Distinct45345
Distinct (%)99.8%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean107984.6
Minimum2
Maximum469172
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size355.2 KiB
2023-06-12T20:40:42.467808image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum2
5-th percentile5364
Q126367.5
median59871
Q3156327.5
95-th percentile357045
Maximum469172
Range469170
Interquartile range (IQR)129960

Descriptive statistics

Standard deviation112111.34
Coefficient of variation (CV)1.038216
Kurtosis0.56268656
Mean107984.6
Median Absolute Deviation (MAD)44448
Skewness1.2836233
Sum4.908008 × 109
Variance1.2568952 × 1010
MonotonicityNot monotonic
2023-06-12T20:40:43.216102image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
141971 9
 
< 0.1%
69234 4
 
< 0.1%
152795 4
 
< 0.1%
84198 4
 
< 0.1%
110428 4
 
< 0.1%
159849 4
 
< 0.1%
42495 4
 
< 0.1%
23305 4
 
< 0.1%
132641 4
 
< 0.1%
97995 4
 
< 0.1%
Other values (45335) 45406
99.9%
ValueCountFrequency (%)
2 1
< 0.1%
3 1
< 0.1%
5 1
< 0.1%
6 1
< 0.1%
11 1
< 0.1%
12 1
< 0.1%
13 1
< 0.1%
14 1
< 0.1%
15 1
< 0.1%
16 1
< 0.1%
ValueCountFrequency (%)
469172 1
< 0.1%
468707 1
< 0.1%
468343 1
< 0.1%
467731 1
< 0.1%
465044 1
< 0.1%
464819 1
< 0.1%
464207 1
< 0.1%
464111 1
< 0.1%
463906 1
< 0.1%
463800 1
< 0.1%
Distinct89
Distinct (%)0.2%
Missing11
Missing (%)< 0.1%
Memory size355.2 KiB
2023-06-12T20:40:43.566714image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length2
Median length2
Mean length2
Min length2

Characters and Unicode

Total characters90880
Distinct characters26
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique17 ?
Unique (%)< 0.1%

Sample

1st rowen
2nd rowen
3rd rowen
4th rowen
5th rowen
ValueCountFrequency (%)
en 32249
71.0%
fr 2442
 
5.4%
it 1528
 
3.4%
ja 1355
 
3.0%
de 1081
 
2.4%
es 991
 
2.2%
ru 822
 
1.8%
hi 508
 
1.1%
ko 444
 
1.0%
zh 408
 
0.9%
Other values (79) 3612
 
7.9%
2023-06-12T20:40:44.298275image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 34576
38.0%
n 32957
36.3%
r 3635
 
4.0%
f 2848
 
3.1%
i 2394
 
2.6%
t 2252
 
2.5%
a 1849
 
2.0%
s 1653
 
1.8%
j 1356
 
1.5%
d 1328
 
1.5%
Other values (16) 6032
 
6.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 90880
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 34576
38.0%
n 32957
36.3%
r 3635
 
4.0%
f 2848
 
3.1%
i 2394
 
2.6%
t 2252
 
2.5%
a 1849
 
2.0%
s 1653
 
1.8%
j 1356
 
1.5%
d 1328
 
1.5%
Other values (16) 6032
 
6.6%

Most occurring scripts

ValueCountFrequency (%)
Latin 90880
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 34576
38.0%
n 32957
36.3%
r 3635
 
4.0%
f 2848
 
3.1%
i 2394
 
2.6%
t 2252
 
2.5%
a 1849
 
2.0%
s 1653
 
1.8%
j 1356
 
1.5%
d 1328
 
1.5%
Other values (16) 6032
 
6.6%

Most occurring blocks

ValueCountFrequency (%)
ASCII 90880
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 34576
38.0%
n 32957
36.3%
r 3635
 
4.0%
f 2848
 
3.1%
i 2394
 
2.6%
t 2252
 
2.5%
a 1849
 
2.0%
s 1653
 
1.8%
j 1356
 
1.5%
d 1328
 
1.5%
Other values (16) 6032
 
6.6%
Distinct44231
Distinct (%)99.4%
Missing941
Missing (%)2.1%
Memory size355.2 KiB
2023-06-12T20:40:44.988897image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length1000
Median length785.5
Mean length323.34062
Min length1

Characters and Unicode

Total characters14391891
Distinct characters429
Distinct categories25 ?
Distinct scripts13 ?
Distinct blocks21 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique44158 ?
Unique (%)99.2%

Sample

1st rowLed by Woody, Andy's toys live happily in his room until Andy's birthday brings Buzz Lightyear onto the scene. Afraid of losing his place in Andy's heart, Woody plots against Buzz. But when circumstances separate Buzz and Woody from their owner, the duo eventually learns to put aside their differences.
2nd rowWhen siblings Judy and Peter discover an enchanted board game that opens the door to a magical world, they unwittingly invite Alan -- an adult who's been trapped inside the game for 26 years -- into their living room. Alan's only hope for freedom is to finish the game, which proves risky as all three find themselves running from giant rhinoceroses, evil monkeys and other terrifying creatures.
3rd rowA family wedding reignites the ancient feud between next-door neighbors and fishing buddies John and Max. Meanwhile, a sultry Italian divorcée opens a restaurant at the local bait shop, alarming the locals who worry she'll scare the fish away. But she's less interested in seafood than she is in cooking up a hot time with Max.
4th rowCheated on, mistreated and stepped on, the women are holding their breath, waiting for the elusive "good man" to break a string of less-than-stellar lovers. Friends and confidants Vannah, Bernie, Glo and Robin talk it all out, determined to find a better way to breathe.
5th rowJust when George Banks has recovered from his daughter's wedding, he receives the news that she's pregnant ... and that George's wife, Nina, is expecting too. He was planning on selling their home, but that's a plan that -- like George -- will have to change with the arrival of both a grandchild and a kid of his own.
ValueCountFrequency (%)
the 138354
 
5.6%
a 99047
 
4.0%
and 75412
 
3.1%
to 73460
 
3.0%
of 69697
 
2.8%
in 48229
 
2.0%
is 36550
 
1.5%
his 36245
 
1.5%
with 23952
 
1.0%
her 21534
 
0.9%
Other values (97091) 1830707
74.6%
2023-06-12T20:40:46.116439image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
2410755
16.8%
e 1366185
 
9.5%
a 942297
 
6.5%
t 936443
 
6.5%
i 853069
 
5.9%
o 831349
 
5.8%
n 824127
 
5.7%
s 769276
 
5.3%
r 745721
 
5.2%
h 601895
 
4.2%
Other values (419) 4110774
28.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 11170382
77.6%
Space Separator 2410793
 
16.8%
Uppercase Letter 391701
 
2.7%
Other Punctuation 313376
 
2.2%
Decimal Number 42294
 
0.3%
Dash Punctuation 36817
 
0.3%
Close Punctuation 10115
 
0.1%
Open Punctuation 10092
 
0.1%
Final Punctuation 4570
 
< 0.1%
Initial Punctuation 886
 
< 0.1%
Other values (15) 865
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 1366185
12.2%
a 942297
 
8.4%
t 936443
 
8.4%
i 853069
 
7.6%
o 831349
 
7.4%
n 824127
 
7.4%
s 769276
 
6.9%
r 745721
 
6.7%
h 601895
 
5.4%
l 479707
 
4.3%
Other values (142) 2820313
25.2%
Uppercase Letter
ValueCountFrequency (%)
A 42818
 
10.9%
T 36031
 
9.2%
S 31186
 
8.0%
M 23991
 
6.1%
B 23743
 
6.1%
C 22870
 
5.8%
H 19462
 
5.0%
W 18697
 
4.8%
I 16837
 
4.3%
D 16325
 
4.2%
Other values (77) 139741
35.7%
Other Letter
ValueCountFrequency (%)
6
 
4.8%
6
 
4.8%
5
 
4.0%
4
 
3.2%
3
 
2.4%
3
 
2.4%
3
 
2.4%
3
 
2.4%
2
 
1.6%
2
 
1.6%
Other values (76) 88
70.4%
Other Punctuation
ValueCountFrequency (%)
, 133694
42.7%
. 125010
39.9%
' 31176
 
9.9%
" 11669
 
3.7%
: 3309
 
1.1%
? 2760
 
0.9%
; 2496
 
0.8%
! 1549
 
0.5%
/ 765
 
0.2%
& 455
 
0.1%
Other values (12) 493
 
0.2%
Nonspacing Mark
ValueCountFrequency (%)
ి 4
12.1%
́ 4
12.1%
3
9.1%
̈ 3
9.1%
3
9.1%
3
9.1%
2
 
6.1%
2
 
6.1%
2
 
6.1%
2
 
6.1%
Other values (4) 5
15.2%
Decimal Number
ValueCountFrequency (%)
1 9770
23.1%
0 8273
19.6%
9 6417
15.2%
2 4256
10.1%
5 2442
 
5.8%
8 2381
 
5.6%
3 2353
 
5.6%
4 2183
 
5.2%
7 2131
 
5.0%
6 2088
 
4.9%
Spacing Mark
ValueCountFrequency (%)
11
40.7%
4
 
14.8%
3
 
11.1%
3
 
11.1%
2
 
7.4%
ि 2
 
7.4%
1
 
3.7%
ி 1
 
3.7%
Dash Punctuation
ValueCountFrequency (%)
- 35294
95.9%
881
 
2.4%
633
 
1.7%
5
 
< 0.1%
4
 
< 0.1%
Other Symbol
ValueCountFrequency (%)
® 45
70.3%
14
 
21.9%
° 2
 
3.1%
¦ 2
 
3.1%
1
 
1.6%
Math Symbol
ValueCountFrequency (%)
~ 20
50.0%
+ 11
27.5%
= 6
 
15.0%
| 2
 
5.0%
1
 
2.5%
Open Punctuation
ValueCountFrequency (%)
( 10039
99.5%
[ 50
 
0.5%
{ 2
 
< 0.1%
1
 
< 0.1%
Currency Symbol
ValueCountFrequency (%)
$ 317
96.4%
£ 10
 
3.0%
1
 
0.3%
1
 
0.3%
Space Separator
ValueCountFrequency (%)
2410755
> 99.9%
  36
 
< 0.1%
  2
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 10063
99.5%
] 50
 
0.5%
} 2
 
< 0.1%
Final Punctuation
ValueCountFrequency (%)
3857
84.4%
694
 
15.2%
» 19
 
0.4%
Initial Punctuation
ValueCountFrequency (%)
676
76.3%
192
 
21.7%
« 18
 
2.0%
Control
ValueCountFrequency (%)
106
96.4%
’ 3
 
2.7%
 1
 
0.9%
Modifier Symbol
ValueCountFrequency (%)
´ 25
65.8%
` 12
31.6%
¯ 1
 
2.6%
Format
ValueCountFrequency (%)
31
60.8%
­ 20
39.2%
Other Number
ValueCountFrequency (%)
¹ 8
50.0%
½ 8
50.0%
Connector Punctuation
ValueCountFrequency (%)
_ 19
100.0%
Line Separator
ValueCountFrequency (%)
7
100.0%
Letter Number
ValueCountFrequency (%)
2
100.0%
Modifier Letter
ValueCountFrequency (%)
ʼ 2
100.0%
Paragraph Separator
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 11556851
80.3%
Common 2829621
 
19.7%
Cyrillic 4587
 
< 0.1%
Greek 648
 
< 0.1%
Devanagari 77
 
< 0.1%
Telugu 30
 
< 0.1%
Hiragana 20
 
< 0.1%
Tamil 19
 
< 0.1%
Han 10
 
< 0.1%
Hangul 9
 
< 0.1%
Other values (3) 19
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 1366185
11.8%
a 942297
 
8.2%
t 936443
 
8.1%
i 853069
 
7.4%
o 831349
 
7.2%
n 824127
 
7.1%
s 769276
 
6.7%
r 745721
 
6.5%
h 601895
 
5.2%
l 479707
 
4.2%
Other values (132) 3206782
27.7%
Common
ValueCountFrequency (%)
2410755
85.2%
, 133694
 
4.7%
. 125010
 
4.4%
- 35294
 
1.2%
' 31176
 
1.1%
" 11669
 
0.4%
) 10063
 
0.4%
( 10039
 
0.4%
1 9770
 
0.3%
0 8273
 
0.3%
Other values (71) 43878
 
1.6%
Cyrillic
ValueCountFrequency (%)
о 470
 
10.2%
е 404
 
8.8%
а 373
 
8.1%
н 323
 
7.0%
и 299
 
6.5%
т 265
 
5.8%
р 240
 
5.2%
с 218
 
4.8%
в 173
 
3.8%
л 161
 
3.5%
Other values (46) 1661
36.2%
Greek
ValueCountFrequency (%)
α 60
 
9.3%
ο 55
 
8.5%
τ 43
 
6.6%
η 36
 
5.6%
ι 36
 
5.6%
ν 34
 
5.2%
ρ 31
 
4.8%
ε 31
 
4.8%
π 30
 
4.6%
ς 30
 
4.6%
Other values (33) 262
40.4%
Devanagari
ValueCountFrequency (%)
11
 
14.3%
6
 
7.8%
6
 
7.8%
5
 
6.5%
4
 
5.2%
3
 
3.9%
3
 
3.9%
3
 
3.9%
3
 
3.9%
3
 
3.9%
Other values (21) 30
39.0%
Hiragana
ValueCountFrequency (%)
4
20.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
Other values (7) 7
35.0%
Telugu
ValueCountFrequency (%)
ి 4
13.3%
3
10.0%
3
10.0%
3
10.0%
2
 
6.7%
2
 
6.7%
2
 
6.7%
2
 
6.7%
2
 
6.7%
1
 
3.3%
Other values (6) 6
20.0%
Tamil
ValueCountFrequency (%)
3
15.8%
2
10.5%
2
10.5%
2
10.5%
2
10.5%
1
 
5.3%
1
 
5.3%
1
 
5.3%
1
 
5.3%
1
 
5.3%
Other values (3) 3
15.8%
Han
ValueCountFrequency (%)
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
Hangul
ValueCountFrequency (%)
2
22.2%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
Thai
ValueCountFrequency (%)
2
25.0%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
Arabic
ValueCountFrequency (%)
م 2
50.0%
ہ 1
25.0%
ت 1
25.0%
Inherited
ValueCountFrequency (%)
́ 4
57.1%
̈ 3
42.9%

Most occurring blocks

ValueCountFrequency (%)
ASCII 14373857
99.9%
Punctuation 7288
 
0.1%
None 5948
 
< 0.1%
Cyrillic 4587
 
< 0.1%
Devanagari 77
 
< 0.1%
Telugu 30
 
< 0.1%
Hiragana 20
 
< 0.1%
Tamil 19
 
< 0.1%
Letterlike Symbols 14
 
< 0.1%
CJK 10
 
< 0.1%
Other values (11) 41
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
2410755
16.8%
e 1366185
 
9.5%
a 942297
 
6.6%
t 936443
 
6.5%
i 853069
 
5.9%
o 831349
 
5.8%
n 824127
 
5.7%
s 769276
 
5.4%
r 745721
 
5.2%
h 601895
 
4.2%
Other values (82) 4092740
28.5%
Punctuation
ValueCountFrequency (%)
3857
52.9%
881
 
12.1%
694
 
9.5%
676
 
9.3%
633
 
8.7%
303
 
4.2%
192
 
2.6%
31
 
0.4%
7
 
0.1%
5
 
0.1%
Other values (4) 9
 
0.1%
None
ValueCountFrequency (%)
é 1568
26.4%
ä 294
 
4.9%
á 293
 
4.9%
ö 250
 
4.2%
í 243
 
4.1%
è 209
 
3.5%
ü 178
 
3.0%
ı 165
 
2.8%
ó 164
 
2.8%
ç 158
 
2.7%
Other values (141) 2426
40.8%
Cyrillic
ValueCountFrequency (%)
о 470
 
10.2%
е 404
 
8.8%
а 373
 
8.1%
н 323
 
7.0%
и 299
 
6.5%
т 265
 
5.8%
р 240
 
5.2%
с 218
 
4.8%
в 173
 
3.8%
л 161
 
3.5%
Other values (46) 1661
36.2%
Letterlike Symbols
ValueCountFrequency (%)
14
100.0%
Devanagari
ValueCountFrequency (%)
11
 
14.3%
6
 
7.8%
6
 
7.8%
5
 
6.5%
4
 
5.2%
3
 
3.9%
3
 
3.9%
3
 
3.9%
3
 
3.9%
3
 
3.9%
Other values (21) 30
39.0%
Telugu
ValueCountFrequency (%)
ి 4
13.3%
3
10.0%
3
10.0%
3
10.0%
2
 
6.7%
2
 
6.7%
2
 
6.7%
2
 
6.7%
2
 
6.7%
1
 
3.3%
Other values (6) 6
20.0%
Hiragana
ValueCountFrequency (%)
4
20.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
1
 
5.0%
Other values (7) 7
35.0%
Alphabetic PF
ValueCountFrequency (%)
4
100.0%
Diacriticals
ValueCountFrequency (%)
́ 4
57.1%
̈ 3
42.9%
Tamil
ValueCountFrequency (%)
3
15.8%
2
10.5%
2
10.5%
2
10.5%
2
10.5%
1
 
5.3%
1
 
5.3%
1
 
5.3%
1
 
5.3%
1
 
5.3%
Other values (3) 3
15.8%
Number Forms
ValueCountFrequency (%)
2
100.0%
Arabic
ValueCountFrequency (%)
م 2
50.0%
ہ 1
25.0%
ت 1
25.0%
Hangul
ValueCountFrequency (%)
2
22.2%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
1
11.1%
Modifier Letters
ValueCountFrequency (%)
ʼ 2
100.0%
Thai
ValueCountFrequency (%)
2
25.0%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
CJK
ValueCountFrequency (%)
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
1
10.0%
Math Operators
ValueCountFrequency (%)
1
100.0%
Katakana
ValueCountFrequency (%)
1
100.0%
Currency Symbols
ValueCountFrequency (%)
1
50.0%
1
50.0%
Specials
ValueCountFrequency (%)
1
100.0%

popularity
Real number (ℝ)

HIGH CORRELATION  SKEWED 

Distinct43730
Distinct (%)96.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.9268873
Minimum0
Maximum547.4883
Zeros40
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size355.2 KiB
2023-06-12T20:40:46.471741image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0.0207485
Q10.3890945
median1.131386
Q33.6943695
95-th percentile11.063692
Maximum547.4883
Range547.4883
Interquartile range (IQR)3.305275

Descriptive statistics

Standard deviation6.0064578
Coefficient of variation (CV)2.0521657
Kurtosis1924.6212
Mean2.9268873
Median Absolute Deviation (MAD)0.96857
Skewness29.21395
Sum133029.96
Variance36.077535
MonotonicityNot monotonic
2023-06-12T20:40:46.797863image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 × 10-656
 
0.1%
0.000308 42
 
0.1%
0 40
 
0.1%
0.00022 39
 
0.1%
0.000844 38
 
0.1%
0.000578 38
 
0.1%
0.001177 38
 
0.1%
0.002001 27
 
0.1%
0.003013 21
 
< 0.1%
0.00353 19
 
< 0.1%
Other values (43720) 45093
99.2%
ValueCountFrequency (%)
0 40
0.1%
1 × 10-656
0.1%
2 × 10-66
 
< 0.1%
3 × 10-66
 
< 0.1%
4 × 10-65
 
< 0.1%
5 × 10-61
 
< 0.1%
6 × 10-62
 
< 0.1%
7 × 10-61
 
< 0.1%
8 × 10-66
 
< 0.1%
9 × 10-62
 
< 0.1%
ValueCountFrequency (%)
547.488298 1
< 0.1%
294.337037 1
< 0.1%
287.253654 1
< 0.1%
228.032744 1
< 0.1%
213.849907 1
< 0.1%
187.860492 1
< 0.1%
185.330992 1
< 0.1%
185.070892 1
< 0.1%
183.870374 1
< 0.1%
154.801009 1
< 0.1%
Distinct10567
Distinct (%)31.5%
Missing11875
Missing (%)26.1%
Memory size355.2 KiB
2023-06-12T20:40:47.382520image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length91
Median length60
Mean length18.712741
Min length2

Characters and Unicode

Total characters628299
Distinct characters251
Distinct categories14 ?
Distinct scripts6 ?
Distinct blocks6 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique7036 ?
Unique (%)21.0%

Sample

1st rowPixar Animation Studios
2nd rowTriStar Pictures
3rd rowWarner Bros.
4th rowTwentieth Century Fox Film Corporation
5th rowSandollar Productions
ValueCountFrequency (%)
pictures 6304
 
7.5%
films 4172
 
5.0%
productions 3760
 
4.5%
film 3557
 
4.3%
entertainment 2105
 
2.5%
corporation 1686
 
2.0%
paramount 1041
 
1.2%
fox 1039
 
1.2%
universal 964
 
1.2%
company 919
 
1.1%
Other values (9668) 58010
69.4%
2023-06-12T20:40:48.359241image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
i 51228
 
8.2%
49985
 
8.0%
e 44149
 
7.0%
r 42126
 
6.7%
o 41527
 
6.6%
n 41263
 
6.6%
t 40767
 
6.5%
a 36615
 
5.8%
s 30581
 
4.9%
l 24081
 
3.8%
Other values (241) 225977
36.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 471615
75.1%
Uppercase Letter 94783
 
15.1%
Space Separator 49985
 
8.0%
Other Punctuation 3375
 
0.5%
Dash Punctuation 2668
 
0.4%
Open Punctuation 2072
 
0.3%
Close Punctuation 2072
 
0.3%
Decimal Number 1503
 
0.2%
Math Symbol 105
 
< 0.1%
Other Letter 101
 
< 0.1%
Other values (4) 20
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
i 51228
10.9%
e 44149
9.4%
r 42126
8.9%
o 41527
8.8%
n 41263
8.7%
t 40767
8.6%
a 36615
 
7.8%
s 30581
 
6.5%
l 24081
 
5.1%
u 21889
 
4.6%
Other values (93) 97389
20.7%
Other Letter
ValueCountFrequency (%)
6
 
5.9%
5
 
5.0%
5
 
5.0%
5
 
5.0%
4
 
4.0%
4
 
4.0%
4
 
4.0%
4
 
4.0%
3
 
3.0%
3
 
3.0%
Other values (47) 58
57.4%
Uppercase Letter
ValueCountFrequency (%)
P 14353
15.1%
F 12524
13.2%
C 10009
10.6%
M 7182
 
7.6%
S 5084
 
5.4%
T 4336
 
4.6%
B 4321
 
4.6%
G 4230
 
4.5%
A 4211
 
4.4%
E 4021
 
4.2%
Other values (43) 24512
25.9%
Other Punctuation
ValueCountFrequency (%)
. 2595
76.9%
& 277
 
8.2%
/ 197
 
5.8%
, 189
 
5.6%
" 94
 
2.8%
! 11
 
0.3%
\ 4
 
0.1%
% 2
 
0.1%
: 2
 
0.1%
; 1
 
< 0.1%
Other values (3) 3
 
0.1%
Decimal Number
ValueCountFrequency (%)
2 408
27.1%
0 292
19.4%
1 204
13.6%
3 156
 
10.4%
4 135
 
9.0%
9 76
 
5.1%
7 72
 
4.8%
6 64
 
4.3%
5 58
 
3.9%
8 38
 
2.5%
Open Punctuation
ValueCountFrequency (%)
( 2069
99.9%
[ 2
 
0.1%
1
 
< 0.1%
Close Punctuation
ValueCountFrequency (%)
) 2069
99.9%
] 2
 
0.1%
1
 
< 0.1%
Dash Punctuation
ValueCountFrequency (%)
- 2666
99.9%
2
 
0.1%
Final Punctuation
ValueCountFrequency (%)
» 3
75.0%
1
 
25.0%
Space Separator
ValueCountFrequency (%)
49985
100.0%
Math Symbol
ValueCountFrequency (%)
+ 105
100.0%
Other Symbol
ValueCountFrequency (%)
° 10
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 3
100.0%
Initial Punctuation
ValueCountFrequency (%)
« 3
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 566115
90.1%
Common 61800
 
9.8%
Cyrillic 253
 
< 0.1%
Hangul 88
 
< 0.1%
Greek 31
 
< 0.1%
Han 12
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
i 51228
 
9.0%
e 44149
 
7.8%
r 42126
 
7.4%
o 41527
 
7.3%
n 41263
 
7.3%
t 40767
 
7.2%
a 36615
 
6.5%
s 30581
 
5.4%
l 24081
 
4.3%
u 21889
 
3.9%
Other values (88) 191889
33.9%
Hangul
ValueCountFrequency (%)
6
 
6.8%
5
 
5.7%
5
 
5.7%
5
 
5.7%
4
 
4.5%
4
 
4.5%
4
 
4.5%
4
 
4.5%
3
 
3.4%
3
 
3.4%
Other values (34) 45
51.1%
Cyrillic
ValueCountFrequency (%)
и 28
 
11.1%
о 21
 
8.3%
с 14
 
5.5%
л 14
 
5.5%
ь 13
 
5.1%
м 13
 
5.1%
т 12
 
4.7%
н 12
 
4.7%
а 12
 
4.7%
у 11
 
4.3%
Other values (29) 103
40.7%
Common
ValueCountFrequency (%)
49985
80.9%
- 2666
 
4.3%
. 2595
 
4.2%
( 2069
 
3.3%
) 2069
 
3.3%
2 408
 
0.7%
0 292
 
0.5%
& 277
 
0.4%
1 204
 
0.3%
/ 197
 
0.3%
Other values (28) 1038
 
1.7%
Greek
ValueCountFrequency (%)
ν 3
 
9.7%
ο 3
 
9.7%
ρ 2
 
6.5%
Κ 2
 
6.5%
τ 2
 
6.5%
Ε 2
 
6.5%
λ 2
 
6.5%
η 2
 
6.5%
ι 2
 
6.5%
έ 1
 
3.2%
Other values (10) 10
32.3%
Han
ValueCountFrequency (%)
1
8.3%
1
8.3%
1
8.3%
1
8.3%
1
8.3%
1
8.3%
1
8.3%
1
8.3%
1
8.3%
1
8.3%
Other values (2) 2
16.7%

Most occurring blocks

ValueCountFrequency (%)
ASCII 626100
99.7%
None 1842
 
0.3%
Cyrillic 253
 
< 0.1%
Hangul 88
 
< 0.1%
CJK 12
 
< 0.1%
Punctuation 4
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
i 51228
 
8.2%
49985
 
8.0%
e 44149
 
7.1%
r 42126
 
6.7%
o 41527
 
6.6%
n 41263
 
6.6%
t 40767
 
6.5%
a 36615
 
5.8%
s 30581
 
4.9%
l 24081
 
3.8%
Other values (72) 223778
35.7%
None
ValueCountFrequency (%)
é 842
45.7%
á 161
 
8.7%
ó 158
 
8.6%
ô 79
 
4.3%
í 64
 
3.5%
ü 58
 
3.1%
ç 53
 
2.9%
è 47
 
2.6%
ñ 40
 
2.2%
ä 37
 
2.0%
Other values (61) 303
 
16.4%
Cyrillic
ValueCountFrequency (%)
и 28
 
11.1%
о 21
 
8.3%
с 14
 
5.5%
л 14
 
5.5%
ь 13
 
5.1%
м 13
 
5.1%
т 12
 
4.7%
н 12
 
4.7%
а 12
 
4.7%
у 11
 
4.3%
Other values (29) 103
40.7%
Hangul
ValueCountFrequency (%)
6
 
6.8%
5
 
5.7%
5
 
5.7%
5
 
5.7%
4
 
4.5%
4
 
4.5%
4
 
4.5%
4
 
4.5%
3
 
3.4%
3
 
3.4%
Other values (34) 45
51.1%
Punctuation
ValueCountFrequency (%)
2
50.0%
1
25.0%
1
25.0%
CJK
ValueCountFrequency (%)
1
8.3%
1
8.3%
1
8.3%
1
8.3%
1
8.3%
1
8.3%
1
8.3%
1
8.3%
1
8.3%
1
8.3%
Other values (2) 2
16.7%
Distinct141
Distinct (%)0.4%
Missing6220
Missing (%)13.7%
Memory size355.2 KiB
2023-06-12T20:40:48.896911image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length36
Median length24
Mean length15.351125
Min length4

Characters and Unicode

Total characters602240
Distinct characters51
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique21 ?
Unique (%)0.1%

Sample

1st rowUnited States of America
2nd rowUnited States of America
3rd rowUnited States of America
4th rowUnited States of America
5th rowUnited States of America
ValueCountFrequency (%)
united 21528
21.7%
states 18443
18.6%
of 18442
18.6%
america 18442
18.6%
kingdom 3072
 
3.1%
france 2716
 
2.7%
canada 1499
 
1.5%
japan 1499
 
1.5%
italy 1470
 
1.5%
germany 1427
 
1.4%
Other values (155) 10490
10.6%
2023-06-12T20:40:49.792932image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
e 67297
11.2%
t 61933
 
10.3%
59797
 
9.9%
a 58261
 
9.7%
i 49059
 
8.1%
n 38007
 
6.3%
d 28630
 
4.8%
r 26207
 
4.4%
o 24663
 
4.1%
m 23738
 
3.9%
Other values (41) 164648
27.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 461881
76.7%
Uppercase Letter 80562
 
13.4%
Space Separator 59797
 
9.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 67297
14.6%
t 61933
13.4%
a 58261
12.6%
i 49059
10.6%
n 38007
8.2%
d 28630
6.2%
r 26207
 
5.7%
o 24663
 
5.3%
m 23738
 
5.1%
c 22031
 
4.8%
Other values (16) 62055
13.4%
Uppercase Letter
ValueCountFrequency (%)
U 21593
26.8%
S 20203
25.1%
A 19422
24.1%
K 4015
 
5.0%
F 3055
 
3.8%
I 2648
 
3.3%
C 2102
 
2.6%
G 1583
 
2.0%
J 1509
 
1.9%
R 1071
 
1.3%
Other values (14) 3361
 
4.2%
Space Separator
ValueCountFrequency (%)
59797
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 542443
90.1%
Common 59797
 
9.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 67297
12.4%
t 61933
11.4%
a 58261
10.7%
i 49059
 
9.0%
n 38007
 
7.0%
d 28630
 
5.3%
r 26207
 
4.8%
o 24663
 
4.5%
m 23738
 
4.4%
c 22031
 
4.1%
Other values (40) 142617
26.3%
Common
ValueCountFrequency (%)
59797
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 602240
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 67297
11.2%
t 61933
 
10.3%
59797
 
9.9%
a 58261
 
9.7%
i 49059
 
8.1%
n 38007
 
6.3%
d 28630
 
4.8%
r 26207
 
4.4%
o 24663
 
4.1%
m 23738
 
3.9%
Other values (41) 164648
27.3%
Distinct17333
Distinct (%)38.1%
Missing0
Missing (%)0.0%
Memory size355.2 KiB
Minimum1874-12-09 00:00:00
Maximum2020-12-16 00:00:00
2023-06-12T20:40:50.152215image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:50.510011image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)

revenue
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct6863
Distinct (%)15.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean11219298
Minimum0
Maximum2.7879651 × 109
Zeros38024
Zeros (%)83.7%
Negative0
Negative (%)0.0%
Memory size355.2 KiB
2023-06-12T20:40:50.882798image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile47982500
Maximum2.7879651 × 109
Range2.7879651 × 109
Interquartile range (IQR)0

Descriptive statistics

Standard deviation64339842
Coefficient of variation (CV)5.7347477
Kurtosis237.43708
Mean11219298
Median Absolute Deviation (MAD)0
Skewness12.263643
Sum5.099283 × 1011
Variance4.1396153 × 1015
MonotonicityNot monotonic
2023-06-12T20:40:51.219123image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 38024
83.7%
12000000 20
 
< 0.1%
11000000 19
 
< 0.1%
10000000 19
 
< 0.1%
2000000 18
 
< 0.1%
6000000 17
 
< 0.1%
5000000 14
 
< 0.1%
8000000 13
 
< 0.1%
500000 13
 
< 0.1%
14000000 12
 
< 0.1%
Other values (6853) 7282
 
16.0%
ValueCountFrequency (%)
0 38024
83.7%
1 12
 
< 0.1%
2 3
 
< 0.1%
3 9
 
< 0.1%
4 4
 
< 0.1%
5 5
 
< 0.1%
6 2
 
< 0.1%
7 4
 
< 0.1%
8 5
 
< 0.1%
9 1
 
< 0.1%
ValueCountFrequency (%)
2787965087 1
< 0.1%
2068223624 1
< 0.1%
1845034188 1
< 0.1%
1519557910 1
< 0.1%
1513528810 1
< 0.1%
1506249360 1
< 0.1%
1405403694 1
< 0.1%
1342000000 1
< 0.1%
1274219009 1
< 0.1%
1262886337 1
< 0.1%

runtime
Real number (ℝ)

Distinct353
Distinct (%)0.8%
Missing246
Missing (%)0.5%
Infinite0
Infinite (%)0.0%
Mean94.181905
Minimum0
Maximum1256
Zeros1535
Zeros (%)3.4%
Negative0
Negative (%)0.0%
Memory size355.2 KiB
2023-06-12T20:40:51.576917image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile12
Q185
median95
Q3107
95-th percentile138
Maximum1256
Range1256
Interquartile range (IQR)22

Descriptive statistics

Standard deviation38.329504
Coefficient of variation (CV)0.40697312
Kurtosis93.884668
Mean94.181905
Median Absolute Deviation (MAD)11
Skewness4.486616
Sum4257493
Variance1469.1509
MonotonicityNot monotonic
2023-06-12T20:40:51.955215image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
90 2552
 
5.6%
0 1535
 
3.4%
100 1471
 
3.2%
95 1412
 
3.1%
93 1219
 
2.7%
96 1104
 
2.4%
92 1081
 
2.4%
94 1064
 
2.3%
91 1056
 
2.3%
88 1030
 
2.3%
Other values (343) 31681
69.7%
ValueCountFrequency (%)
0 1535
3.4%
1 107
 
0.2%
2 34
 
0.1%
3 49
 
0.1%
4 50
 
0.1%
5 51
 
0.1%
6 72
 
0.2%
7 103
 
0.2%
8 78
 
0.2%
9 63
 
0.1%
ValueCountFrequency (%)
1256 1
< 0.1%
1140 2
< 0.1%
931 1
< 0.1%
925 1
< 0.1%
900 1
< 0.1%
877 1
< 0.1%
874 1
< 0.1%
840 2
< 0.1%
780 1
< 0.1%
720 1
< 0.1%
Distinct72
Distinct (%)0.2%
Missing3995
Missing (%)8.8%
Memory size355.2 KiB
2023-06-12T20:40:52.352986image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length16
Median length7
Mean length6.9373794
Min length3

Characters and Unicode

Total characters287596
Distinct characters172
Distinct categories8 ?
Distinct scripts15 ?
Distinct blocks16 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique4 ?
Unique (%)< 0.1%

Sample

1st rowEnglish
2nd rowEnglish
3rd rowEnglish
4th rowEnglish
5th rowEnglish
ValueCountFrequency (%)
english 26861
63.0%
français 2436
 
5.7%
italiano 1410
 
3.3%
日本語 1392
 
3.3%
deutsch 1303
 
3.1%
español 1143
 
2.7%
pусский 905
 
2.1%
हिन्दी 549
 
1.3%
한국어/조선말 446
 
1.0%
普通话 413
 
1.0%
Other values (67) 5788
 
13.6%
2023-06-12T20:40:53.075109image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
s 34624
12.0%
n 32233
11.2%
i 31681
11.0%
l 30065
10.5%
h 28221
9.8%
E 28047
9.8%
g 27971
9.7%
a 11115
 
3.9%
o 4026
 
1.4%
r 3576
 
1.2%
Other values (162) 56037
19.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 230270
80.1%
Uppercase Letter 36967
 
12.9%
Other Letter 15948
 
5.5%
Spacing Mark 1402
 
0.5%
Space Separator 1190
 
0.4%
Nonspacing Mark 905
 
0.3%
Other Punctuation 899
 
0.3%
Decimal Number 15
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 34624
15.0%
n 32233
14.0%
i 31681
13.8%
l 30065
13.1%
h 28221
12.3%
g 27971
12.1%
a 11115
 
4.8%
o 4026
 
1.7%
r 3576
 
1.6%
t 3218
 
1.4%
Other values (64) 23540
10.2%
Other Letter
ValueCountFrequency (%)
1392
 
8.7%
1392
 
8.7%
1392
 
8.7%
818
 
5.1%
810
 
5.1%
549
 
3.4%
549
 
3.4%
549
 
3.4%
446
 
2.8%
446
 
2.8%
Other values (46) 7605
47.7%
Uppercase Letter
ValueCountFrequency (%)
E 28047
75.9%
F 2437
 
6.6%
D 1582
 
4.3%
P 1478
 
4.0%
I 1410
 
3.8%
N 701
 
1.9%
L 360
 
1.0%
Č 266
 
0.7%
T 164
 
0.4%
M 142
 
0.4%
Other values (13) 380
 
1.0%
Spacing Mark
ValueCountFrequency (%)
549
39.2%
ि 549
39.2%
86
 
6.1%
86
 
6.1%
ி 81
 
5.8%
43
 
3.1%
4
 
0.3%
4
 
0.3%
Nonspacing Mark
ValueCountFrequency (%)
549
60.7%
ִ 152
 
16.8%
81
 
9.0%
ְ 76
 
8.4%
43
 
4.8%
4
 
0.4%
Other Punctuation
ValueCountFrequency (%)
/ 851
94.7%
? 33
 
3.7%
\ 15
 
1.7%
Space Separator
ValueCountFrequency (%)
1190
100.0%
Decimal Number
ValueCountFrequency (%)
9 15
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 260027
90.4%
Han 7845
 
2.7%
Cyrillic 5999
 
2.1%
Devanagari 3294
 
1.1%
Hangul 2676
 
0.9%
Arabic 2433
 
0.8%
Common 2104
 
0.7%
Greek 1064
 
0.4%
Hebrew 608
 
0.2%
Thai 497
 
0.2%
Other values (5) 1049
 
0.4%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 34624
13.3%
n 32233
12.4%
i 31681
12.2%
l 30065
11.6%
h 28221
10.9%
E 28047
10.8%
g 27971
10.8%
a 11115
 
4.3%
o 4026
 
1.5%
r 3576
 
1.4%
Other values (51) 28468
10.9%
Cyrillic
ValueCountFrequency (%)
с 1853
30.9%
к 989
16.5%
и 971
16.2%
й 921
15.4%
у 907
15.1%
а 63
 
1.1%
р 43
 
0.7%
з 33
 
0.6%
б 27
 
0.5%
е 27
 
0.5%
Other values (12) 165
 
2.8%
Arabic
ValueCountFrequency (%)
ا 382
15.7%
ر 382
15.7%
ل 265
10.9%
ع 265
10.9%
ب 265
10.9%
ي 265
10.9%
ة 265
10.9%
ف 102
 
4.2%
س 102
 
4.2%
ی 102
 
4.2%
Other values (5) 38
 
1.6%
Han
ValueCountFrequency (%)
1392
17.7%
1392
17.7%
1392
17.7%
818
10.4%
810
10.3%
413
 
5.3%
413
 
5.3%
广 405
 
5.2%
405
 
5.2%
405
 
5.2%
Greek
ValueCountFrequency (%)
λ 266
25.0%
ε 133
12.5%
η 133
12.5%
ν 133
12.5%
ι 133
12.5%
κ 133
12.5%
ά 133
12.5%
Hebrew
ValueCountFrequency (%)
ִ 152
25.0%
ְ 76
12.5%
ע 76
12.5%
ב 76
12.5%
ר 76
12.5%
ת 76
12.5%
י 76
12.5%
Georgian
ValueCountFrequency (%)
21
14.3%
21
14.3%
21
14.3%
21
14.3%
21
14.3%
21
14.3%
21
14.3%
Devanagari
ValueCountFrequency (%)
549
16.7%
549
16.7%
ि 549
16.7%
549
16.7%
549
16.7%
549
16.7%
Hangul
ValueCountFrequency (%)
446
16.7%
446
16.7%
446
16.7%
446
16.7%
446
16.7%
446
16.7%
Thai
ValueCountFrequency (%)
142
28.6%
71
14.3%
71
14.3%
71
14.3%
71
14.3%
71
14.3%
Gurmukhi
ValueCountFrequency (%)
4
16.7%
4
16.7%
4
16.7%
4
16.7%
4
16.7%
4
16.7%
Common
ValueCountFrequency (%)
1190
56.6%
/ 851
40.4%
? 33
 
1.6%
9 15
 
0.7%
\ 15
 
0.7%
Telugu
ValueCountFrequency (%)
86
33.3%
43
16.7%
43
16.7%
43
16.7%
43
16.7%
Tamil
ValueCountFrequency (%)
81
20.0%
81
20.0%
81
20.0%
ி 81
20.0%
81
20.0%
Bengali
ValueCountFrequency (%)
86
40.0%
43
20.0%
43
20.0%
43
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 257095
89.4%
CJK 7845
 
2.7%
None 6066
 
2.1%
Cyrillic 5999
 
2.1%
Devanagari 3294
 
1.1%
Hangul 2676
 
0.9%
Arabic 2433
 
0.8%
Hebrew 608
 
0.2%
Thai 497
 
0.2%
Tamil 405
 
0.1%
Other values (6) 678
 
0.2%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 34624
13.5%
n 32233
12.5%
i 31681
12.3%
l 30065
11.7%
h 28221
11.0%
E 28047
10.9%
g 27971
10.9%
a 11115
 
4.3%
o 4026
 
1.6%
r 3576
 
1.4%
Other values (40) 25536
9.9%
None
ValueCountFrequency (%)
ç 2585
42.6%
ñ 1143
18.8%
ê 329
 
5.4%
ý 266
 
4.4%
Č 266
 
4.4%
λ 266
 
4.4%
ü 149
 
2.5%
ε 133
 
2.2%
η 133
 
2.2%
ν 133
 
2.2%
Other values (10) 663
 
10.9%
Cyrillic
ValueCountFrequency (%)
с 1853
30.9%
к 989
16.5%
и 971
16.2%
й 921
15.4%
у 907
15.1%
а 63
 
1.1%
р 43
 
0.7%
з 33
 
0.6%
б 27
 
0.5%
е 27
 
0.5%
Other values (12) 165
 
2.8%
CJK
ValueCountFrequency (%)
1392
17.7%
1392
17.7%
1392
17.7%
818
10.4%
810
10.3%
413
 
5.3%
413
 
5.3%
广 405
 
5.2%
405
 
5.2%
405
 
5.2%
Devanagari
ValueCountFrequency (%)
549
16.7%
549
16.7%
ि 549
16.7%
549
16.7%
549
16.7%
549
16.7%
Hangul
ValueCountFrequency (%)
446
16.7%
446
16.7%
446
16.7%
446
16.7%
446
16.7%
446
16.7%
Arabic
ValueCountFrequency (%)
ا 382
15.7%
ر 382
15.7%
ل 265
10.9%
ع 265
10.9%
ب 265
10.9%
ي 265
10.9%
ة 265
10.9%
ف 102
 
4.2%
س 102
 
4.2%
ی 102
 
4.2%
Other values (5) 38
 
1.6%
Hebrew
ValueCountFrequency (%)
ִ 152
25.0%
ְ 76
12.5%
ע 76
12.5%
ב 76
12.5%
ר 76
12.5%
ת 76
12.5%
י 76
12.5%
Thai
ValueCountFrequency (%)
142
28.6%
71
14.3%
71
14.3%
71
14.3%
71
14.3%
71
14.3%
Bengali
ValueCountFrequency (%)
86
40.0%
43
20.0%
43
20.0%
43
20.0%
Telugu
ValueCountFrequency (%)
86
33.3%
43
16.7%
43
16.7%
43
16.7%
43
16.7%
Tamil
ValueCountFrequency (%)
81
20.0%
81
20.0%
81
20.0%
ி 81
20.0%
81
20.0%
Georgian
ValueCountFrequency (%)
21
14.3%
21
14.3%
21
14.3%
21
14.3%
21
14.3%
21
14.3%
21
14.3%
Latin Ext Additional
ValueCountFrequency (%)
15
50.0%
ế 15
50.0%
IPA Ext
ValueCountFrequency (%)
ə 4
100.0%
Gurmukhi
ValueCountFrequency (%)
4
16.7%
4
16.7%
4
16.7%
4
16.7%
4
16.7%
4
16.7%

status
Categorical

Distinct6
Distinct (%)< 0.1%
Missing80
Missing (%)0.2%
Memory size355.2 KiB
Released
45009 
Rumored
 
232
Post Production
 
97
In Production
 
19
Planned
 
13

Length

Max length15
Median length8
Mean length8.0116594
Min length7

Characters and Unicode

Total characters363497
Distinct characters18
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique1 ?
Unique (%)< 0.1%

Sample

1st rowReleased
2nd rowReleased
3rd rowReleased
4th rowReleased
5th rowReleased

Common Values

ValueCountFrequency (%)
Released 45009
99.0%
Rumored 232
 
0.5%
Post Production 97
 
0.2%
In Production 19
 
< 0.1%
Planned 13
 
< 0.1%
Canceled 1
 
< 0.1%
(Missing) 80
 
0.2%

Length

2023-06-12T20:40:53.403914image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-06-12T20:40:53.789735image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
ValueCountFrequency (%)
released 45009
98.9%
rumored 232
 
0.5%
production 116
 
0.3%
post 97
 
0.2%
in 19
 
< 0.1%
planned 13
 
< 0.1%
canceled 1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e 135274
37.2%
d 45371
 
12.5%
R 45241
 
12.4%
s 45106
 
12.4%
l 45023
 
12.4%
a 45023
 
12.4%
o 561
 
0.2%
r 348
 
0.1%
u 348
 
0.1%
m 232
 
0.1%
Other values (8) 970
 
0.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 317894
87.5%
Uppercase Letter 45487
 
12.5%
Space Separator 116
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 135274
42.6%
d 45371
 
14.3%
s 45106
 
14.2%
l 45023
 
14.2%
a 45023
 
14.2%
o 561
 
0.2%
r 348
 
0.1%
u 348
 
0.1%
m 232
 
0.1%
t 213
 
0.1%
Other values (3) 395
 
0.1%
Uppercase Letter
ValueCountFrequency (%)
R 45241
99.5%
P 226
 
0.5%
I 19
 
< 0.1%
C 1
 
< 0.1%
Space Separator
ValueCountFrequency (%)
116
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 363381
> 99.9%
Common 116
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 135274
37.2%
d 45371
 
12.5%
R 45241
 
12.5%
s 45106
 
12.4%
l 45023
 
12.4%
a 45023
 
12.4%
o 561
 
0.2%
r 348
 
0.1%
u 348
 
0.1%
m 232
 
0.1%
Other values (7) 854
 
0.2%
Common
ValueCountFrequency (%)
116
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 363497
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 135274
37.2%
d 45371
 
12.5%
R 45241
 
12.4%
s 45106
 
12.4%
l 45023
 
12.4%
a 45023
 
12.4%
o 561
 
0.2%
r 348
 
0.1%
u 348
 
0.1%
m 232
 
0.1%
Other values (8) 970
 
0.3%
Distinct20269
Distinct (%)99.2%
Missing25026
Missing (%)55.1%
Memory size355.2 KiB
2023-06-12T20:40:54.360408image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length297
Median length204
Mean length47.003231
Min length1

Characters and Unicode

Total characters960041
Distinct characters170
Distinct categories17 ?
Distinct scripts6 ?
Distinct blocks10 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique20160 ?
Unique (%)98.7%

Sample

1st rowRoll the dice and unleash the excitement!
2nd rowStill Yelling. Still Fighting. Still Ready for Love.
3rd rowFriends are the people who let you be yourself... and never let you forget it.
4th rowJust When His World Is Back To Normal... He's In For The Surprise Of His Life!
5th rowA Los Angeles Crime Saga
ValueCountFrequency (%)
the 11025
 
6.3%
a 6826
 
3.9%
of 4410
 
2.5%
to 3592
 
2.1%
is 2804
 
1.6%
in 2698
 
1.5%
and 2684
 
1.5%
you 2392
 
1.4%
1588
 
0.9%
for 1524
 
0.9%
Other values (15100) 134648
77.3%
2023-06-12T20:40:55.410797image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
153914
16.0%
e 94574
 
9.9%
t 57367
 
6.0%
o 56644
 
5.9%
a 51524
 
5.4%
n 47583
 
5.0%
i 46091
 
4.8%
r 45082
 
4.7%
s 42396
 
4.4%
h 37238
 
3.9%
Other values (160) 327628
34.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 681492
71.0%
Space Separator 153914
 
16.0%
Uppercase Letter 75054
 
7.8%
Other Punctuation 44624
 
4.6%
Decimal Number 2687
 
0.3%
Dash Punctuation 1950
 
0.2%
Final Punctuation 98
 
< 0.1%
Open Punctuation 56
 
< 0.1%
Close Punctuation 55
 
< 0.1%
Currency Symbol 37
 
< 0.1%
Other values (7) 74
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 94574
13.9%
t 57367
 
8.4%
o 56644
 
8.3%
a 51524
 
7.6%
n 47583
 
7.0%
i 46091
 
6.8%
r 45082
 
6.6%
s 42396
 
6.2%
h 37238
 
5.5%
l 30206
 
4.4%
Other values (43) 172787
25.4%
Other Letter
ValueCountFrequency (%)
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
1
 
2.9%
Other values (24) 24
70.6%
Uppercase Letter
ValueCountFrequency (%)
T 10013
 
13.3%
A 6881
 
9.2%
S 5662
 
7.5%
H 4405
 
5.9%
I 4388
 
5.8%
E 4310
 
5.7%
W 3689
 
4.9%
O 3480
 
4.6%
N 3200
 
4.3%
L 3196
 
4.3%
Other values (20) 25830
34.4%
Other Punctuation
ValueCountFrequency (%)
. 26666
59.8%
! 5784
 
13.0%
' 5678
 
12.7%
, 4234
 
9.5%
? 1167
 
2.6%
" 582
 
1.3%
148
 
0.3%
: 140
 
0.3%
& 83
 
0.2%
* 42
 
0.1%
Other values (7) 100
 
0.2%
Decimal Number
ValueCountFrequency (%)
0 802
29.8%
1 516
19.2%
2 299
 
11.1%
9 208
 
7.7%
3 208
 
7.7%
5 168
 
6.3%
4 140
 
5.2%
7 121
 
4.5%
6 121
 
4.5%
8 104
 
3.9%
Math Symbol
ValueCountFrequency (%)
+ 5
35.7%
= 5
35.7%
| 2
 
14.3%
~ 1
 
7.1%
1
 
7.1%
Dash Punctuation
ValueCountFrequency (%)
- 1933
99.1%
9
 
0.5%
8
 
0.4%
Final Punctuation
ValueCountFrequency (%)
82
83.7%
15
 
15.3%
» 1
 
1.0%
Initial Punctuation
ValueCountFrequency (%)
14
73.7%
4
 
21.1%
« 1
 
5.3%
Open Punctuation
ValueCountFrequency (%)
( 49
87.5%
[ 7
 
12.5%
Close Punctuation
ValueCountFrequency (%)
) 48
87.3%
] 7
 
12.7%
Other Number
ValueCountFrequency (%)
½ 2
66.7%
² 1
33.3%
Modifier Letter
ValueCountFrequency (%)
ˌ 1
50.0%
ˈ 1
50.0%
Space Separator
ValueCountFrequency (%)
153914
100.0%
Currency Symbol
ValueCountFrequency (%)
$ 37
100.0%
Nonspacing Mark
ValueCountFrequency (%)
1
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 756546
78.8%
Common 203460
 
21.2%
Han 21
 
< 0.1%
Tamil 5
 
< 0.1%
Hiragana 5
 
< 0.1%
Katakana 4
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 94574
 
12.5%
t 57367
 
7.6%
o 56644
 
7.5%
a 51524
 
6.8%
n 47583
 
6.3%
i 46091
 
6.1%
r 45082
 
6.0%
s 42396
 
5.6%
h 37238
 
4.9%
l 30206
 
4.0%
Other values (73) 247841
32.8%
Common
ValueCountFrequency (%)
153914
75.6%
. 26666
 
13.1%
! 5784
 
2.8%
' 5678
 
2.8%
, 4234
 
2.1%
- 1933
 
1.0%
? 1167
 
0.6%
0 802
 
0.4%
" 582
 
0.3%
1 516
 
0.3%
Other values (42) 2184
 
1.1%
Han
ValueCountFrequency (%)
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
Other values (11) 11
52.4%
Tamil
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%
Hiragana
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%
Katakana
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 959609
> 99.9%
Punctuation 280
 
< 0.1%
None 112
 
< 0.1%
CJK 21
 
< 0.1%
Tamil 5
 
< 0.1%
Hiragana 5
 
< 0.1%
Katakana 4
 
< 0.1%
IPA Ext 2
 
< 0.1%
Modifier Letters 2
 
< 0.1%
Math Operators 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
153914
16.0%
e 94574
 
9.9%
t 57367
 
6.0%
o 56644
 
5.9%
a 51524
 
5.4%
n 47583
 
5.0%
i 46091
 
4.8%
r 45082
 
4.7%
s 42396
 
4.4%
h 37238
 
3.9%
Other values (78) 327196
34.1%
Punctuation
ValueCountFrequency (%)
148
52.9%
82
29.3%
15
 
5.4%
14
 
5.0%
9
 
3.2%
8
 
2.9%
4
 
1.4%
None
ValueCountFrequency (%)
é 20
17.9%
ä 16
14.3%
ö 8
 
7.1%
á 6
 
5.4%
ó 6
 
5.4%
í 5
 
4.5%
ü 5
 
4.5%
ı 5
 
4.5%
· 4
 
3.6%
ć 3
 
2.7%
Other values (26) 34
30.4%
IPA Ext
ValueCountFrequency (%)
ə 2
100.0%
Tamil
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%
CJK
ValueCountFrequency (%)
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
1
 
4.8%
Other values (11) 11
52.4%
Katakana
ValueCountFrequency (%)
1
25.0%
1
25.0%
1
25.0%
1
25.0%
Modifier Letters
ValueCountFrequency (%)
ˌ 1
50.0%
ˈ 1
50.0%
Hiragana
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%
Math Operators
ValueCountFrequency (%)
1
100.0%

title
Text

Distinct42195
Distinct (%)92.8%
Missing0
Missing (%)0.0%
Memory size355.2 KiB
2023-06-12T20:40:56.104056image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length105
Median length79
Mean length16.700623
Min length1

Characters and Unicode

Total characters759060
Distinct characters287
Distinct categories17 ?
Distinct scripts7 ?
Distinct blocks12 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique39856 ?
Unique (%)87.7%

Sample

1st rowToy Story
2nd rowJumanji
3rd rowGrumpier Old Men
4th rowWaiting to Exhale
5th rowFather of the Bride Part II
ValueCountFrequency (%)
the 14577
 
10.7%
of 4944
 
3.6%
a 2248
 
1.6%
in 1693
 
1.2%
and 1637
 
1.2%
to 1056
 
0.8%
760
 
0.6%
man 665
 
0.5%
love 664
 
0.5%
for 601
 
0.4%
Other values (24353) 107545
78.9%
2023-06-12T20:40:57.223817image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
90961
 
12.0%
e 76381
 
10.1%
a 49019
 
6.5%
o 45759
 
6.0%
n 40872
 
5.4%
r 40082
 
5.3%
i 39803
 
5.2%
t 36764
 
4.8%
s 29568
 
3.9%
h 28558
 
3.8%
Other values (277) 281293
37.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 535007
70.5%
Uppercase Letter 117434
 
15.5%
Space Separator 90961
 
12.0%
Other Punctuation 10499
 
1.4%
Decimal Number 3860
 
0.5%
Dash Punctuation 985
 
0.1%
Close Punctuation 87
 
< 0.1%
Open Punctuation 85
 
< 0.1%
Final Punctuation 38
 
< 0.1%
Other Letter 25
 
< 0.1%
Other values (7) 79
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 76381
14.3%
a 49019
9.2%
o 45759
 
8.6%
n 40872
 
7.6%
r 40082
 
7.5%
i 39803
 
7.4%
t 36764
 
6.9%
s 29568
 
5.5%
h 28558
 
5.3%
l 25973
 
4.9%
Other values (121) 122228
22.8%
Uppercase Letter
ValueCountFrequency (%)
T 16037
13.7%
S 10346
 
8.8%
M 8035
 
6.8%
B 7676
 
6.5%
C 7184
 
6.1%
A 6792
 
5.8%
D 6348
 
5.4%
L 5879
 
5.0%
W 5174
 
4.4%
H 5170
 
4.4%
Other values (65) 38793
33.0%
Other Letter
ValueCountFrequency (%)
چ 2
 
8.0%
ه 2
 
8.0%
ک 2
 
8.0%
ی 2
 
8.0%
ª 1
 
4.0%
1
 
4.0%
1
 
4.0%
1
 
4.0%
1
 
4.0%
1
 
4.0%
Other values (11) 11
44.0%
Other Punctuation
ValueCountFrequency (%)
: 3725
35.5%
' 2504
23.8%
. 1603
15.3%
, 1137
 
10.8%
! 647
 
6.2%
& 458
 
4.4%
? 269
 
2.6%
/ 79
 
0.8%
* 19
 
0.2%
# 13
 
0.1%
Other values (8) 45
 
0.4%
Decimal Number
ValueCountFrequency (%)
2 861
22.3%
1 701
18.2%
0 616
16.0%
3 482
12.5%
9 232
 
6.0%
4 231
 
6.0%
5 227
 
5.9%
7 193
 
5.0%
8 161
 
4.2%
6 156
 
4.0%
Math Symbol
ValueCountFrequency (%)
+ 17
70.8%
× 3
 
12.5%
1
 
4.2%
= 1
 
4.2%
1
 
4.2%
1
 
4.2%
Other Number
ValueCountFrequency (%)
½ 12
63.2%
² 3
 
15.8%
³ 2
 
10.5%
1
 
5.3%
1
 
5.3%
Other Symbol
ValueCountFrequency (%)
° 3
37.5%
2
25.0%
1
 
12.5%
1
 
12.5%
1
 
12.5%
Currency Symbol
ValueCountFrequency (%)
$ 18
85.7%
¢ 2
 
9.5%
£ 1
 
4.8%
Dash Punctuation
ValueCountFrequency (%)
- 970
98.5%
15
 
1.5%
Close Punctuation
ValueCountFrequency (%)
) 82
94.3%
] 5
 
5.7%
Open Punctuation
ValueCountFrequency (%)
( 80
94.1%
[ 5
 
5.9%
Final Punctuation
ValueCountFrequency (%)
37
97.4%
1
 
2.6%
Initial Punctuation
ValueCountFrequency (%)
1
50.0%
1
50.0%
Space Separator
ValueCountFrequency (%)
90961
100.0%
Connector Punctuation
ValueCountFrequency (%)
_ 3
100.0%
Format
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 651926
85.9%
Common 106594
 
14.0%
Cyrillic 346
 
< 0.1%
Greek 170
 
< 0.1%
Arabic 11
 
< 0.1%
Katakana 8
 
< 0.1%
Han 5
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 76381
 
11.7%
a 49019
 
7.5%
o 45759
 
7.0%
n 40872
 
6.3%
r 40082
 
6.1%
i 39803
 
6.1%
t 36764
 
5.6%
s 29568
 
4.5%
h 28558
 
4.4%
l 25973
 
4.0%
Other values (107) 239147
36.7%
Common
ValueCountFrequency (%)
90961
85.3%
: 3725
 
3.5%
' 2504
 
2.3%
. 1603
 
1.5%
, 1137
 
1.1%
- 970
 
0.9%
2 861
 
0.8%
1 701
 
0.7%
! 647
 
0.6%
0 616
 
0.6%
Other values (50) 2869
 
2.7%
Cyrillic
ValueCountFrequency (%)
е 32
 
9.2%
о 32
 
9.2%
а 29
 
8.4%
н 24
 
6.9%
и 23
 
6.6%
р 22
 
6.4%
к 17
 
4.9%
с 15
 
4.3%
л 14
 
4.0%
в 14
 
4.0%
Other values (38) 124
35.8%
Greek
ValueCountFrequency (%)
α 20
 
11.8%
ο 14
 
8.2%
ι 14
 
8.2%
τ 9
 
5.3%
ά 8
 
4.7%
ρ 8
 
4.7%
λ 8
 
4.7%
ν 7
 
4.1%
ς 6
 
3.5%
ε 6
 
3.5%
Other values (32) 70
41.2%
Katakana
ValueCountFrequency (%)
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
Arabic
ValueCountFrequency (%)
چ 2
18.2%
ه 2
18.2%
ک 2
18.2%
ی 2
18.2%
س 1
9.1%
ا 1
9.1%
ج 1
9.1%
Han
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 757486
99.8%
None 1133
 
0.1%
Cyrillic 346
 
< 0.1%
Punctuation 62
 
< 0.1%
Arabic 11
 
< 0.1%
Katakana 8
 
< 0.1%
CJK 5
 
< 0.1%
Misc Symbols 3
 
< 0.1%
Math Operators 2
 
< 0.1%
Letterlike Symbols 2
 
< 0.1%
Other values (2) 2
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
90961
 
12.0%
e 76381
 
10.1%
a 49019
 
6.5%
o 45759
 
6.0%
n 40872
 
5.4%
r 40082
 
5.3%
i 39803
 
5.3%
t 36764
 
4.9%
s 29568
 
3.9%
h 28558
 
3.8%
Other values (76) 279719
36.9%
None
ValueCountFrequency (%)
é 222
19.6%
ä 128
 
11.3%
ö 57
 
5.0%
è 53
 
4.7%
ô 44
 
3.9%
ü 39
 
3.4%
ó 37
 
3.3%
á 35
 
3.1%
ı 35
 
3.1%
í 33
 
2.9%
Other values (108) 450
39.7%
Punctuation
ValueCountFrequency (%)
37
59.7%
15
24.2%
5
 
8.1%
2
 
3.2%
1
 
1.6%
1
 
1.6%
1
 
1.6%
Cyrillic
ValueCountFrequency (%)
е 32
 
9.2%
о 32
 
9.2%
а 29
 
8.4%
н 24
 
6.9%
и 23
 
6.6%
р 22
 
6.4%
к 17
 
4.9%
с 15
 
4.3%
л 14
 
4.0%
в 14
 
4.0%
Other values (38) 124
35.8%
Arabic
ValueCountFrequency (%)
چ 2
18.2%
ه 2
18.2%
ک 2
18.2%
ی 2
18.2%
س 1
9.1%
ا 1
9.1%
ج 1
9.1%
Misc Symbols
ValueCountFrequency (%)
2
66.7%
1
33.3%
CJK
ValueCountFrequency (%)
1
20.0%
1
20.0%
1
20.0%
1
20.0%
1
20.0%
Number Forms
ValueCountFrequency (%)
1
100.0%
Math Operators
ValueCountFrequency (%)
1
50.0%
1
50.0%
Letterlike Symbols
ValueCountFrequency (%)
1
50.0%
1
50.0%
Katakana
ValueCountFrequency (%)
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
1
12.5%
Arrows
ValueCountFrequency (%)
1
100.0%

vote_average
Real number (ℝ)

Distinct92
Distinct (%)0.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean5.6240853
Minimum0
Maximum10
Zeros2953
Zeros (%)6.5%
Negative0
Negative (%)0.0%
Memory size355.2 KiB
2023-06-12T20:40:57.606164image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q15
median6
Q36.8
95-th percentile7.8
Maximum10
Range10
Interquartile range (IQR)1.8

Descriptive statistics

Standard deviation1.9154242
Coefficient of variation (CV)0.34057524
Kurtosis2.543282
Mean5.6240853
Median Absolute Deviation (MAD)0.9
Skewness-1.5249159
Sum255620.3
Variance3.6688498
MonotonicityNot monotonic
2023-06-12T20:40:57.971954image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 2953
 
6.5%
6 2465
 
5.4%
5 2006
 
4.4%
7 1885
 
4.1%
6.5 1722
 
3.8%
6.3 1605
 
3.5%
5.5 1383
 
3.0%
5.8 1370
 
3.0%
6.4 1354
 
3.0%
6.7 1351
 
3.0%
Other values (82) 27357
60.2%
ValueCountFrequency (%)
0 2953
6.5%
0.5 13
 
< 0.1%
0.7 1
 
< 0.1%
1 103
 
0.2%
1.1 1
 
< 0.1%
1.2 4
 
< 0.1%
1.3 13
 
< 0.1%
1.4 5
 
< 0.1%
1.5 30
 
0.1%
1.6 6
 
< 0.1%
ValueCountFrequency (%)
10 185
0.4%
9.8 1
 
< 0.1%
9.6 1
 
< 0.1%
9.5 18
 
< 0.1%
9.4 3
 
< 0.1%
9.3 18
 
< 0.1%
9.2 4
 
< 0.1%
9.1 2
 
< 0.1%
9 159
0.3%
8.9 7
 
< 0.1%

vote_count
Real number (ℝ)

HIGH CORRELATION  ZEROS 

Distinct1820
Distinct (%)4.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean109.9897
Minimum0
Maximum14075
Zeros2855
Zeros (%)6.3%
Negative0
Negative (%)0.0%
Memory size355.2 KiB
2023-06-12T20:40:58.323770image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q13
median10
Q334
95-th percentile434
Maximum14075
Range14075
Interquartile range (IQR)31

Descriptive statistics

Standard deviation491.35234
Coefficient of variation (CV)4.4672576
Kurtosis151.1728
Mean109.9897
Median Absolute Deviation (MAD)8
Skewness10.449072
Sum4999142
Variance241427.12
MonotonicityNot monotonic
2023-06-12T20:40:58.668149image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1 3246
 
7.1%
2 3128
 
6.9%
0 2855
 
6.3%
3 2797
 
6.2%
4 2480
 
5.5%
5 2099
 
4.6%
6 1747
 
3.8%
7 1574
 
3.5%
8 1360
 
3.0%
9 1195
 
2.6%
Other values (1810) 22970
50.5%
ValueCountFrequency (%)
0 2855
6.3%
1 3246
7.1%
2 3128
6.9%
3 2797
6.2%
4 2480
5.5%
5 2099
4.6%
6 1747
3.8%
7 1574
3.5%
8 1360
3.0%
9 1195
 
2.6%
ValueCountFrequency (%)
14075 1
< 0.1%
12269 1
< 0.1%
12114 1
< 0.1%
12000 1
< 0.1%
11444 1
< 0.1%
11187 1
< 0.1%
10297 1
< 0.1%
10014 1
< 0.1%
9678 1
< 0.1%
9634 1
< 0.1%

release_year
Real number (ℝ)

Distinct135
Distinct (%)0.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1991.882
Minimum1874
Maximum2020
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size355.2 KiB
2023-06-12T20:40:59.059922image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum1874
5-th percentile1941
Q11978
median2001
Q32010
95-th percentile2015
Maximum2020
Range146
Interquartile range (IQR)32

Descriptive statistics

Standard deviation24.057726
Coefficient of variation (CV)0.012077887
Kurtosis0.84065965
Mean1991.882
Median Absolute Deviation (MAD)12
Skewness-1.2253824
Sum90533030
Variance578.7742
MonotonicityNot monotonic
2023-06-12T20:40:59.409722image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
2014 1976
 
4.3%
2015 1907
 
4.2%
2013 1895
 
4.2%
2012 1727
 
3.8%
2011 1669
 
3.7%
2016 1604
 
3.5%
2009 1591
 
3.5%
2010 1501
 
3.3%
2008 1482
 
3.3%
2007 1322
 
2.9%
Other values (125) 28777
63.3%
ValueCountFrequency (%)
1874 1
 
< 0.1%
1878 1
 
< 0.1%
1883 1
 
< 0.1%
1887 1
 
< 0.1%
1888 2
 
< 0.1%
1890 5
 
< 0.1%
1891 6
< 0.1%
1892 3
 
< 0.1%
1893 1
 
< 0.1%
1894 13
< 0.1%
ValueCountFrequency (%)
2020 1
 
< 0.1%
2018 5
 
< 0.1%
2017 531
 
1.2%
2016 1604
3.5%
2015 1907
4.2%
2014 1976
4.3%
2013 1895
4.2%
2012 1727
3.8%
2011 1669
3.7%
2010 1501
3.3%

return
Real number (ℝ)

HIGH CORRELATION  SKEWED  ZEROS 

Distinct5232
Distinct (%)11.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean658.95421
Minimum0
Maximum12396383
Zeros40058
Zeros (%)88.1%
Negative0
Negative (%)0.0%
Memory size355.2 KiB
2023-06-12T20:40:59.825077image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median0
Q30
95-th percentile2.5358613
Maximum12396383
Range12396383
Interquartile range (IQR)0

Descriptive statistics

Standard deviation74631.645
Coefficient of variation (CV)113.25771
Kurtosis20707.13
Mean658.95421
Median Absolute Deviation (MAD)0
Skewness138.44381
Sum29950128
Variance5.5698825 × 109
MonotonicityNot monotonic
2023-06-12T20:41:00.237837image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0 40058
88.1%
1 20
 
< 0.1%
2 12
 
< 0.1%
4 11
 
< 0.1%
5 8
 
< 0.1%
3 7
 
< 0.1%
2.5 7
 
< 0.1%
1.333333333 7
 
< 0.1%
1.5 6
 
< 0.1%
0.13615 4
 
< 0.1%
Other values (5222) 5311
 
11.7%
ValueCountFrequency (%)
0 40058
88.1%
5.217391304 × 10-71
 
< 0.1%
7.5 × 10-71
 
< 0.1%
9.375 × 10-71
 
< 0.1%
1.499133126 × 10-61
 
< 0.1%
1.8 × 10-61
 
< 0.1%
1.916666667 × 10-61
 
< 0.1%
3.5 × 10-61
 
< 0.1%
4 × 10-61
 
< 0.1%
5.111111111 × 10-61
 
< 0.1%
ValueCountFrequency (%)
12396383 1
< 0.1%
8500000 1
< 0.1%
4197476.625 1
< 0.1%
2755584 1
< 0.1%
1018619.283 1
< 0.1%
1000000 1
< 0.1%
26881.72043 1
< 0.1%
12890.38667 1
< 0.1%
5330.33945 1
< 0.1%
4133.333333 1
< 0.1%

cast
Text

Distinct18341
Distinct (%)42.6%
Missing2364
Missing (%)5.2%
Memory size355.2 KiB
2023-06-12T20:41:01.085013image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length33
Median length30
Mean length13.19818
Min length3

Characters and Unicode

Total characters568670
Distinct characters232
Distinct categories11 ?
Distinct scripts7 ?
Distinct blocks9 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique12578 ?
Unique (%)29.2%

Sample

1st rowTom Hanks
2nd rowRobin Williams
3rd rowWalter Matthau
4th rowWhitney Houston
5th rowSteve Martin
ValueCountFrequency (%)
john 722
 
0.8%
michael 606
 
0.7%
robert 534
 
0.6%
james 510
 
0.6%
richard 405
 
0.5%
david 388
 
0.4%
paul 324
 
0.4%
tom 317
 
0.4%
lee 305
 
0.3%
peter 290
 
0.3%
Other values (18495) 84089
95.0%
2023-06-12T20:41:02.177457image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 53460
 
9.4%
e 51050
 
9.0%
45453
 
8.0%
n 40051
 
7.0%
r 37501
 
6.6%
i 36220
 
6.4%
o 32571
 
5.7%
l 26993
 
4.7%
t 19775
 
3.5%
s 19434
 
3.4%
Other values (222) 206162
36.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 431457
75.9%
Uppercase Letter 89699
 
15.8%
Space Separator 45453
 
8.0%
Dash Punctuation 1120
 
0.2%
Other Punctuation 820
 
0.1%
Other Letter 79
 
< 0.1%
Decimal Number 20
 
< 0.1%
Final Punctuation 14
 
< 0.1%
Nonspacing Mark 3
 
< 0.1%
Initial Punctuation 3
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 53460
12.4%
e 51050
11.8%
n 40051
9.3%
r 37501
 
8.7%
i 36220
 
8.4%
o 32571
 
7.5%
l 26993
 
6.3%
t 19775
 
4.6%
s 19434
 
4.5%
h 15359
 
3.6%
Other values (108) 99043
23.0%
Uppercase Letter
ValueCountFrequency (%)
M 7676
 
8.6%
S 7307
 
8.1%
B 6739
 
7.5%
C 6617
 
7.4%
J 6564
 
7.3%
A 5649
 
6.3%
D 5081
 
5.7%
R 4997
 
5.6%
L 4515
 
5.0%
K 4239
 
4.7%
Other values (57) 30315
33.8%
Other Letter
ValueCountFrequency (%)
8
 
10.1%
ی 6
 
7.6%
5
 
6.3%
5
 
6.3%
م 5
 
6.3%
5
 
6.3%
ا 4
 
5.1%
3
 
3.8%
3
 
3.8%
3
 
3.8%
Other values (20) 32
40.5%
Other Punctuation
ValueCountFrequency (%)
. 773
94.3%
, 27
 
3.3%
" 12
 
1.5%
· 3
 
0.4%
! 2
 
0.2%
\ 2
 
0.2%
: 1
 
0.1%
Decimal Number
ValueCountFrequency (%)
0 10
50.0%
5 7
35.0%
2 3
 
15.0%
Final Punctuation
ValueCountFrequency (%)
13
92.9%
1
 
7.1%
Space Separator
ValueCountFrequency (%)
45453
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1120
100.0%
Nonspacing Mark
ValueCountFrequency (%)
́ 3
100.0%
Initial Punctuation
ValueCountFrequency (%)
3
100.0%
Open Punctuation
ValueCountFrequency (%)
2
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 520847
91.6%
Common 47432
 
8.3%
Cyrillic 309
 
0.1%
Han 45
 
< 0.1%
Arabic 31
 
< 0.1%
Inherited 3
 
< 0.1%
Hangul 3
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 53460
 
10.3%
e 51050
 
9.8%
n 40051
 
7.7%
r 37501
 
7.2%
i 36220
 
7.0%
o 32571
 
6.3%
l 26993
 
5.2%
t 19775
 
3.8%
s 19434
 
3.7%
h 15359
 
2.9%
Other values (135) 188433
36.2%
Cyrillic
ValueCountFrequency (%)
а 30
 
9.7%
и 30
 
9.7%
е 26
 
8.4%
о 22
 
7.1%
р 21
 
6.8%
н 21
 
6.8%
л 18
 
5.8%
к 17
 
5.5%
с 16
 
5.2%
в 10
 
3.2%
Other values (30) 98
31.7%
Common
ValueCountFrequency (%)
45453
95.8%
- 1120
 
2.4%
. 773
 
1.6%
, 27
 
0.1%
13
 
< 0.1%
" 12
 
< 0.1%
0 10
 
< 0.1%
5 7
 
< 0.1%
2 3
 
< 0.1%
· 3
 
< 0.1%
Other values (6) 11
 
< 0.1%
Han
ValueCountFrequency (%)
8
17.8%
5
11.1%
5
11.1%
5
11.1%
3
 
6.7%
3
 
6.7%
3
 
6.7%
3
 
6.7%
3
 
6.7%
3
 
6.7%
Other values (4) 4
8.9%
Arabic
ValueCountFrequency (%)
ی 6
19.4%
م 5
16.1%
ا 4
12.9%
د 3
9.7%
ع 2
 
6.5%
ه 2
 
6.5%
پ 2
 
6.5%
ن 2
 
6.5%
س 1
 
3.2%
ت 1
 
3.2%
Other values (3) 3
9.7%
Hangul
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%
Inherited
ValueCountFrequency (%)
́ 3
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 565103
99.4%
None 3150
 
0.6%
Cyrillic 309
 
0.1%
CJK 45
 
< 0.1%
Arabic 31
 
< 0.1%
Punctuation 19
 
< 0.1%
Latin Ext Additional 7
 
< 0.1%
Diacriticals 3
 
< 0.1%
Hangul 3
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 53460
 
9.5%
e 51050
 
9.0%
45453
 
8.0%
n 40051
 
7.1%
r 37501
 
6.6%
i 36220
 
6.4%
o 32571
 
5.8%
l 26993
 
4.8%
t 19775
 
3.5%
s 19434
 
3.4%
Other values (53) 202595
35.9%
None
ValueCountFrequency (%)
é 735
23.3%
á 348
 
11.0%
í 203
 
6.4%
ö 181
 
5.7%
ü 148
 
4.7%
è 137
 
4.3%
ô 130
 
4.1%
ó 130
 
4.1%
ä 99
 
3.1%
ć 79
 
2.5%
Other values (77) 960
30.5%
Cyrillic
ValueCountFrequency (%)
а 30
 
9.7%
и 30
 
9.7%
е 26
 
8.4%
о 22
 
7.1%
р 21
 
6.8%
н 21
 
6.8%
л 18
 
5.8%
к 17
 
5.5%
с 16
 
5.2%
в 10
 
3.2%
Other values (30) 98
31.7%
Punctuation
ValueCountFrequency (%)
13
68.4%
3
 
15.8%
2
 
10.5%
1
 
5.3%
CJK
ValueCountFrequency (%)
8
17.8%
5
11.1%
5
11.1%
5
11.1%
3
 
6.7%
3
 
6.7%
3
 
6.7%
3
 
6.7%
3
 
6.7%
3
 
6.7%
Other values (4) 4
8.9%
Arabic
ValueCountFrequency (%)
ی 6
19.4%
م 5
16.1%
ا 4
12.9%
د 3
9.7%
ع 2
 
6.5%
ه 2
 
6.5%
پ 2
 
6.5%
ن 2
 
6.5%
س 1
 
3.2%
ت 1
 
3.2%
Other values (3) 3
9.7%
Diacriticals
ValueCountFrequency (%)
́ 3
100.0%
Latin Ext Additional
ValueCountFrequency (%)
1
14.3%
1
14.3%
ế 1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
Hangul
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%

crew
Text

Distinct20595
Distinct (%)46.1%
Missing756
Missing (%)1.7%
Memory size355.2 KiB
2023-06-12T20:41:02.842157image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Length

Max length33
Median length29
Mean length13.508916
Min length3

Characters and Unicode

Total characters603781
Distinct characters221
Distinct categories10 ?
Distinct scripts8 ?
Distinct blocks8 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique13445 ?
Unique (%)30.1%

Sample

1st rowJohn Lasseter
2nd rowLarry J. Franco
3rd rowHoward Deutch
4th rowForest Whitaker
5th rowAlan Silvestri
ValueCountFrequency (%)
john 1136
 
1.2%
david 837
 
0.9%
michael 779
 
0.8%
robert 729
 
0.8%
william 500
 
0.5%
james 469
 
0.5%
peter 459
 
0.5%
paul 444
 
0.5%
richard 422
 
0.4%
mark 370
 
0.4%
Other values (19121) 88111
93.5%
2023-06-12T20:41:03.906178image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Most occurring characters

ValueCountFrequency (%)
a 52620
 
8.7%
e 52507
 
8.7%
49662
 
8.2%
r 41654
 
6.9%
n 40982
 
6.8%
i 39321
 
6.5%
o 34755
 
5.8%
l 27140
 
4.5%
s 21864
 
3.6%
t 19842
 
3.3%
Other values (211) 223434
37.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 454066
75.2%
Uppercase Letter 95653
 
15.8%
Space Separator 49662
 
8.2%
Other Punctuation 3169
 
0.5%
Dash Punctuation 1193
 
0.2%
Other Letter 28
 
< 0.1%
Decimal Number 7
 
< 0.1%
Nonspacing Mark 1
 
< 0.1%
Close Punctuation 1
 
< 0.1%
Open Punctuation 1
 
< 0.1%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 52620
11.6%
e 52507
11.6%
r 41654
 
9.2%
n 40982
 
9.0%
i 39321
 
8.7%
o 34755
 
7.7%
l 27140
 
6.0%
s 21864
 
4.8%
t 19842
 
4.4%
h 16605
 
3.7%
Other values (108) 106776
23.5%
Uppercase Letter
ValueCountFrequency (%)
M 8382
 
8.8%
S 7978
 
8.3%
J 6952
 
7.3%
B 6393
 
6.7%
A 5992
 
6.3%
C 5990
 
6.3%
R 5875
 
6.1%
D 5210
 
5.4%
L 4836
 
5.1%
G 4563
 
4.8%
Other values (58) 33482
35.0%
Other Letter
ValueCountFrequency (%)
م 3
 
10.7%
ا 3
 
10.7%
د 3
 
10.7%
ح 2
 
7.1%
ی 2
 
7.1%
1
 
3.6%
1
 
3.6%
1
 
3.6%
و 1
 
3.6%
ي 1
 
3.6%
Other values (10) 10
35.7%
Other Punctuation
ValueCountFrequency (%)
. 3135
98.9%
, 19
 
0.6%
\ 11
 
0.3%
" 2
 
0.1%
& 1
 
< 0.1%
· 1
 
< 0.1%
Decimal Number
ValueCountFrequency (%)
0 3
42.9%
5 2
28.6%
9 1
 
14.3%
3 1
 
14.3%
Space Separator
ValueCountFrequency (%)
49662
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 1193
100.0%
Nonspacing Mark
ValueCountFrequency (%)
́ 1
100.0%
Close Punctuation
ValueCountFrequency (%)
) 1
100.0%
Open Punctuation
ValueCountFrequency (%)
( 1
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 549505
91.0%
Common 54033
 
8.9%
Cyrillic 198
 
< 0.1%
Arabic 18
 
< 0.1%
Greek 16
 
< 0.1%
Han 7
 
< 0.1%
Hangul 3
 
< 0.1%
Inherited 1
 
< 0.1%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 52620
 
9.6%
e 52507
 
9.6%
r 41654
 
7.6%
n 40982
 
7.5%
i 39321
 
7.2%
o 34755
 
6.3%
l 27140
 
4.9%
s 21864
 
4.0%
t 19842
 
3.6%
h 16605
 
3.0%
Other values (121) 202215
36.8%
Cyrillic
ValueCountFrequency (%)
и 23
 
11.6%
а 17
 
8.6%
р 17
 
8.6%
е 16
 
8.1%
л 14
 
7.1%
о 11
 
5.6%
к 9
 
4.5%
н 8
 
4.0%
в 8
 
4.0%
д 7
 
3.5%
Other values (31) 68
34.3%
Common
ValueCountFrequency (%)
49662
91.9%
. 3135
 
5.8%
- 1193
 
2.2%
, 19
 
< 0.1%
\ 11
 
< 0.1%
0 3
 
< 0.1%
" 2
 
< 0.1%
5 2
 
< 0.1%
9 1
 
< 0.1%
& 1
 
< 0.1%
Other values (4) 4
 
< 0.1%
Greek
ValueCountFrequency (%)
ρ 2
12.5%
ς 2
12.5%
Γ 1
 
6.2%
ι 1
 
6.2%
τ 1
 
6.2%
η 1
 
6.2%
ί 1
 
6.2%
ώ 1
 
6.2%
ν 1
 
6.2%
α 1
 
6.2%
Other values (4) 4
25.0%
Arabic
ValueCountFrequency (%)
م 3
16.7%
ا 3
16.7%
د 3
16.7%
ح 2
11.1%
ی 2
11.1%
و 1
 
5.6%
ي 1
 
5.6%
ن 1
 
5.6%
پ 1
 
5.6%
ع 1
 
5.6%
Han
ValueCountFrequency (%)
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
Hangul
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%
Inherited
ValueCountFrequency (%)
́ 1
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 599868
99.4%
None 3682
 
0.6%
Cyrillic 198
 
< 0.1%
Arabic 18
 
< 0.1%
CJK 7
 
< 0.1%
Latin Ext Additional 4
 
< 0.1%
Hangul 3
 
< 0.1%
Diacriticals 1
 
< 0.1%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 52620
 
8.8%
e 52507
 
8.8%
49662
 
8.3%
r 41654
 
6.9%
n 40982
 
6.8%
i 39321
 
6.6%
o 34755
 
5.8%
l 27140
 
4.5%
s 21864
 
3.6%
t 19842
 
3.3%
Other values (55) 219521
36.6%
None
ValueCountFrequency (%)
é 874
23.7%
á 343
 
9.3%
í 231
 
6.3%
ö 205
 
5.6%
ó 198
 
5.4%
ô 173
 
4.7%
è 167
 
4.5%
ä 154
 
4.2%
ç 133
 
3.6%
ü 133
 
3.6%
Other values (81) 1071
29.1%
Cyrillic
ValueCountFrequency (%)
и 23
 
11.6%
а 17
 
8.6%
р 17
 
8.6%
е 16
 
8.1%
л 14
 
7.1%
о 11
 
5.6%
к 9
 
4.5%
н 8
 
4.0%
в 8
 
4.0%
д 7
 
3.5%
Other values (31) 68
34.3%
Arabic
ValueCountFrequency (%)
م 3
16.7%
ا 3
16.7%
د 3
16.7%
ح 2
11.1%
ی 2
11.1%
و 1
 
5.6%
ي 1
 
5.6%
ن 1
 
5.6%
پ 1
 
5.6%
ع 1
 
5.6%
Latin Ext Additional
ValueCountFrequency (%)
2
50.0%
1
25.0%
1
25.0%
Hangul
ValueCountFrequency (%)
1
33.3%
1
33.3%
1
33.3%
CJK
ValueCountFrequency (%)
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
1
14.3%
Diacriticals
ValueCountFrequency (%)
́ 1
100.0%

Interactions

2023-06-12T20:40:32.104033image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:09.193598image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:12.062701image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:14.944486image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:17.631648image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:20.314126image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:23.075068image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:26.145777image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:29.150775image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:32.419869image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:09.540469image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:12.378634image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:15.251325image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:17.960458image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:20.644924image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:23.386889image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:26.460597image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:29.496185image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:32.703690image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:09.831553image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:12.649847image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:15.526149image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:18.231304image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:20.925759image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:23.660244image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:26.747010image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:29.786019image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:33.005697image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:10.144642image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:12.935191image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:15.797996image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:18.531131image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:21.225098image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:23.943083image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:27.083801image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:30.086846image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:33.302323image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:10.445470image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:13.459893image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:16.085879image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:18.809971image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:21.514949image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:24.233918image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:27.513118image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:30.462937image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:33.601150image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:10.764165image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:13.748737image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:16.368191image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:19.107804image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:21.813277image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:24.525751image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:27.868914image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:30.838722image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:33.893982image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:11.066008image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:14.028594image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:16.678195image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:19.396636image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:22.108108image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:24.797121image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:28.176738image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:31.146036image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:34.205805image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:11.394113image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:14.326341image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:16.996012image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:19.702478image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:22.430922image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:25.111924image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:28.499133image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:31.464854image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:34.520661image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:11.730197image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:14.633613image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:17.315831image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:20.019294image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:22.744258image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:25.796252image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:28.824944image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
2023-06-12T20:40:31.785233image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/

Correlations

2023-06-12T20:41:04.212176image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
budgetidpopularityrevenueruntimevote_averagevote_countrelease_yearreturnstatus
budget1.000-0.2550.4620.6440.2270.0720.4840.1410.7750.000
id-0.2551.000-0.410-0.278-0.205-0.149-0.4330.392-0.2620.056
popularity0.462-0.4101.0000.4910.3060.2410.8930.1860.4470.000
revenue0.644-0.2780.4911.0000.2540.1270.5130.1030.8520.000
runtime0.227-0.2050.3060.2541.0000.1930.2900.0340.2340.000
vote_average0.072-0.1490.2410.1270.1931.0000.318-0.0080.1190.019
vote_count0.484-0.4330.8930.5130.2900.3181.0000.1970.4740.000
release_year0.1410.3920.1860.1030.034-0.0080.1971.0000.0860.028
return0.775-0.2620.4470.8520.2340.1190.4740.0861.0000.000
status0.0000.0560.0000.0000.0000.0190.0000.0280.0001.000

Missing values

2023-06-12T20:40:35.130466image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
A simple visualization of nullity by column.
2023-06-12T20:40:36.359987image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2023-06-12T20:40:37.570643image/svg+xmlMatplotlib v3.7.1, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.

Sample

belongs_to_collectionbudgetgenresidoriginal_languageoverviewpopularityproduction_companiesproduction_countriesrelease_daterevenueruntimespoken_languagesstatustaglinetitlevote_averagevote_countrelease_yearreturncastcrew
0Toy Story Collection30000000Animation, Comedy, Family862enLed by Woody, Andy's toys live happily in his room until Andy's birthday brings Buzz Lightyear onto the scene. Afraid of losing his place in Andy's heart, Woody plots against Buzz. But when circumstances separate Buzz and Woody from their owner, the duo eventually learns to put aside their differences.21.946943Pixar Animation StudiosUnited States of America1995-10-3037355403381.0EnglishReleasedNaNToy Story7.75415.0199512.451801Tom HanksJohn Lasseter
1NaN65000000Adventure, Fantasy, Family8844enWhen siblings Judy and Peter discover an enchanted board game that opens the door to a magical world, they unwittingly invite Alan -- an adult who's been trapped inside the game for 26 years -- into their living room. Alan's only hope for freedom is to finish the game, which proves risky as all three find themselves running from giant rhinoceroses, evil monkeys and other terrifying creatures.17.015539TriStar PicturesUnited States of America1995-12-15262797249104.0EnglishReleasedRoll the dice and unleash the excitement!Jumanji6.92413.019954.043035Robin WilliamsLarry J. Franco
2Grumpy Old Men Collection0Romance, Comedy15602enA family wedding reignites the ancient feud between next-door neighbors and fishing buddies John and Max. Meanwhile, a sultry Italian divorcée opens a restaurant at the local bait shop, alarming the locals who worry she'll scare the fish away. But she's less interested in seafood than she is in cooking up a hot time with Max.11.712900Warner Bros.United States of America1995-12-220101.0EnglishReleasedStill Yelling. Still Fighting. Still Ready for Love.Grumpier Old Men6.592.019950.000000Walter MatthauHoward Deutch
3NaN16000000Comedy, Drama, Romance31357enCheated on, mistreated and stepped on, the women are holding their breath, waiting for the elusive "good man" to break a string of less-than-stellar lovers. Friends and confidants Vannah, Bernie, Glo and Robin talk it all out, determined to find a better way to breathe.3.859495Twentieth Century Fox Film CorporationUnited States of America1995-12-2281452156127.0EnglishReleasedFriends are the people who let you be yourself... and never let you forget it.Waiting to Exhale6.134.019955.090760Whitney HoustonForest Whitaker
4Father of the Bride Collection0Comedy11862enJust when George Banks has recovered from his daughter's wedding, he receives the news that she's pregnant ... and that George's wife, Nina, is expecting too. He was planning on selling their home, but that's a plan that -- like George -- will have to change with the arrival of both a grandchild and a kid of his own.8.387519Sandollar ProductionsUnited States of America1995-02-1076578911106.0EnglishReleasedJust When His World Is Back To Normal... He's In For The Surprise Of His Life!Father of the Bride Part II5.7173.019950.000000Steve MartinAlan Silvestri
5NaN60000000Action, Crime, Drama, Thriller949enObsessive master thief, Neil McCauley leads a top-notch crew on various insane heists throughout Los Angeles while a mentally unstable detective, Vincent Hanna pursues him without rest. Each man recognizes and respects the ability and the dedication of the other even though they are aware their cat-and-mouse game may end in violence.17.924927Regency EnterprisesUnited States of America1995-12-15187436818170.0EnglishReleasedA Los Angeles Crime SagaHeat7.71886.019953.123947Al PacinoMichael Mann
6NaN58000000Comedy, Romance11860enAn ugly duckling having undergone a remarkable change, still harbors feelings for her crush: a carefree playboy, but not before his business-focused brother has something to say about it.6.677277Paramount PicturesGermany1995-12-150127.0FrançaisReleasedYou are cordially invited to the most surprising merger of the year.Sabrina6.2141.019950.000000Harrison FordSydney Pollack
7NaN0Action, Adventure, Drama, Family45325enA mischievous young boy, Tom Sawyer, witnesses a murder by the deadly Injun Joe. Tom becomes friends with Huckleberry Finn, a boy with no future and no family. Tom has to choose between honoring a friendship or honoring an oath because the town alcoholic is accused of the murder. Tom and Huck go through several adventures trying to retrieve evidence.2.561161Walt Disney PicturesUnited States of America1995-12-22097.0EnglishReleasedThe Original Bad Boys.Tom and Huck5.445.019950.000000Jonathan Taylor ThomasDavid Loughery
8NaN35000000Action, Adventure, Thriller9091enInternational action superstar Jean Claude Van Damme teams with Powers Boothe in a Tension-packed, suspense thriller, set against the back-drop of a Stanley Cup game.Van Damme portrays a father whose daughter is suddenly taken during a championship hockey game. With the captors demanding a billion dollars by game's end, Van Damme frantically sets a plan in motion to rescue his daughter and abort an impending explosion before the final buzzer...5.231580Universal PicturesUnited States of America1995-12-2264350171106.0EnglishReleasedTerror goes into overtime.Sudden Death5.5174.019951.838576Jean-Claude Van DammePeter Hyams
9James Bond Collection58000000Adventure, Action, Thriller710enJames Bond must unmask the mysterious head of the Janus Syndicate and prevent the leader from utilizing the GoldenEye weapons system to inflict devastating revenge on Britain.14.686036United ArtistsUnited Kingdom1995-11-16352194034130.0EnglishReleasedNo limits. No fears. No substitutes.GoldenEye6.61194.019956.072311Pierce BrosnanMartin Campbell
belongs_to_collectionbudgetgenresidoriginal_languageoverviewpopularityproduction_companiesproduction_countriesrelease_daterevenueruntimespoken_languagesstatustaglinetitlevote_averagevote_countrelease_yearreturncastcrew
45441NaN0NaN67179itSentenced to life imprisonment for illegal activities, Italian International member Giulio Manieri holds on to his political ideals while struggling against madness in the loneliness of his prison cell.0.225051NaNNaN1972-01-01090.0ItalianoReleasedNaNSt. Michael Had a Rooster6.03.019720.0Giulio BrogiLeo Tolstoy
45442NaN0Horror, Mystery, Thriller84419enAn unsuccessful sculptor saves a madman named "The Creeper" from drowning. Seeing an opportunity for revenge, he tricks the psycho into murdering his critics.0.222814Universal PicturesUnited States of America1946-03-29065.0EnglishReleasedMeet...The CREEPER!House of Horrors6.38.019460.0Rondo HattonRussell A. Gausman
45443NaN0Mystery, Horror390959enIn this true-crime documentary, we delve into the murder spree that was the inspiration for Joe Berlinger's "Book of Shadows: Blair Witch 2".0.076061NaNNaN2000-10-22045.0EnglishReleasedNaNShadow of the Blair Witch7.02.020000.0Tony AbatemarcoBen Rock
45444NaN0Horror289923enA film archivist revisits the story of Rustin Parr, a hermit thought to have murdered seven children while under the possession of the Blair Witch.0.386450Neptune Salad EntertainmentUnited States of America2000-10-03030.0EnglishReleasedDo you know what happened 50 years before "The Blair Witch Project"?The Burkittsville 77.01.020000.0Monty BaneBen Rock
45445NaN0Science Fiction222848enIt's the year 3000 AD. The world's most dangerous women are banished to a remote asteroid 45 million light years from earth. Kira Murphy doesn't belong; wrongfully accused of a crime she did not commit, she's thrown in this interplanetary prison and left to her own defenses. But Kira's a fighter, and soon she finds herself in the middle of a female gang war; where everyone wants a piece of the action... and a piece of her! "Caged Heat 3000" takes the Women-in-Prison genre to a whole new level... and a whole new galaxy!0.661558Concorde-New HorizonsUnited States of America1995-01-01085.0EnglishReleasedNaNCaged Heat 30003.51.019950.0Lisa BoyleRoger Corman
45446NaN0Drama, Action, Romance30840enYet another version of the classic epic, with enough variation to make it interesting. The story is the same, but some of the characters are quite different from the usual, in particular Uma Thurman's very special maid Marian. The photography is also great, giving the story a somewhat darker tone.5.683753Westdeutscher Rundfunk (WDR)Canada1991-05-130104.0EnglishReleasedNaNRobin Hood5.726.019910.0Patrick BerginJohn Irvin
45447NaN0Drama111109tlAn artist struggles to finish his work while a storyline about a cult plays in his head.0.178241Sine OliviaPhilippines2011-11-170360.0NaNReleasedNaNCentury of Birthing9.03.020110.0Angel AquinoLav Diaz
45448NaN0Action, Drama, Thriller67758enWhen one of her hits goes wrong, a professional assassin ends up with a suitcase full of a million dollars belonging to a mob boss ...0.903007American World PicturesUnited States of America2003-08-01090.0EnglishReleasedA deadly game of wits.Betrayal3.86.020030.0Erika EleniakMark L. Lester
45449NaN0NaN227506enIn a small town live two brothers, one a minister and the other one a hunchback painter of the chapel who lives with his wife. One dreadful and stormy night, a stranger knocks at the door asking for shelter. The stranger talks about all the good things of the earthly life the minister is missing because of his puritanical faith. The minister comes to accept the stranger's viewpoint but it is others who will pay the consequences because the minister will discover the human pleasures thanks to, ehem, his sister- in -law… The tormented minister and his cuckolded brother will die in a strange accident in the chapel and later an infant will be born from the minister's adulterous relationship.0.003503YermolievRussia1917-10-21087.0NaNReleasedNaNSatan Triumphant0.00.019170.0Iwan MosschuchinYakov Protazanov
45450NaN0NaN461257en50 years after decriminalisation of homosexuality in the UK, director Daisy Asquith mines the jewels of the BFI archive to take us into the relationships, desires, fears and expressions of gay men and women in the 20th century.0.163015NaNUnited Kingdom2017-06-09075.0EnglishReleasedNaNQueerama0.00.020170.0NaNDaisy Asquith

Duplicate rows

Most frequently occurring

belongs_to_collectionbudgetgenresidoriginal_languageoverviewpopularityproduction_companiesproduction_countriesrelease_daterevenueruntimespoken_languagesstatustaglinetitlevote_averagevote_countrelease_yearreturncastcrew# duplicates
34NaN0Thriller, Mystery141971fiRecovering from a nail gun shot to the head and 13 months of coma, doctor Pekka Valinta starts to unravel the mystery of his past, still suffering from total amnesia.0.411949Filmiteollisuus FineFinland2008-12-260108.0suomiReleasedWhich one is the first to return - memory or the murderer?Blackout6.73.020080.0Petteri SummanenJP Siili9
7Why We Fight0Documentary159849enThe third film of Frank Capra's 'Why We Fight" propaganda film series, dealing with the Nazi conquest of Western Europe in 1940.0.473322NaNUnited States of America1943-01-01057.0EnglishReleasedNaNWhy We Fight: Divide and Conquer5.01.019430.0Knox ManningFrank Capra4
11NaN0Action, Horror, Science Fiction18440enWhen a comet strikes Earth and kicks up a cloud of toxic dust, hundreds of humans join the ranks of the living dead. But there's bad news for the survivors: The newly minted zombies are hell-bent on eradicating every last person from the planet. For the few human beings who remain, going head to head with the flesh-eating fiends is their only chance for long-term survival. Yet their battle will be dark and cold, with overwhelming odds.1.436085NaNUnited States of America2007-01-01089.0EnglishReleasedNaNDays of Darkness5.05.020070.0Sabrina GennarinoJake Kennedy4
12NaN0Adventure, Animation, Drama, Action, Foreign23305enIn feudal India, a warrior (Khan) who renounces his role as the longtime enforcer to a local lord becomes the prey in a murderous hunt through the Himalayan mountains.1.967992FilmfourFrance2001-09-23086.0हिन्दीReleasedNaNThe Warrior6.315.020010.0Irrfan KhanAsif Kapadia4
14NaN0Comedy97995enAfter breaking a mirror in his home, superstitious Max tries to avoid situations which could bring bad luck but in doing so, causes himself the worst luck imaginable.0.141558Max Linder ProductionsUnited States of America1921-02-06062.0EnglishReleasedNaNSeven Years Bad Luck5.64.019210.0Max LinderCharles Van Enger4
15NaN0Comedy, Drama11115enAs an ex-gambler teaches a hot-shot college kid some things about playing cards, he finds himself pulled into the world series of poker, where his protégé is his toughest competition.6.880365Andertainment GroupUnited States of America2008-01-29085.0EnglishReleasedNaNDeal5.222.020080.0Burt ReynoldsEric Strand4
16NaN0Comedy, Drama265189svWhile holidaying in the French Alps, a Swedish family deals with acts of cowardliness as an avalanche breaks out.12.165685MotlysNorway2014-08-151359497118.0FrançaisReleasedNaNForce Majeure6.8255.020140.0Lisa Loven KongsliRuben Östlund4
17NaN0Crime, Drama, Thriller5511frHitman Jef Costello is a perfectionist who always carefully plans his murders and who never gets caught.9.091288Fida cinematograficaFrance1967-10-2539481105.0FrançaisReleasedThere is no solitude greater than that of the SamuraiLe Samouraï7.9187.019670.0Alain DelonHenri Decaë4
21NaN0Drama25541daFormer Danish servicemen Lars and Jimmy are thrown together while training in a neo-Nazi group. Moving from hostility through grudging admiration to friendship and finally passion, events take a darker turn when their illicit relationship is uncovered.2.587911NaNSweden2009-10-21090.0DanskReleasedNaNBrotherhood7.121.020090.0Nicolas BroNicolo Donato4
25NaN0Drama, Comedy168538enIn Zola's Paris, an ingenue arrives at a tony bordello: she's Nana, guileless, but quickly learning to use her erotic innocence to get what she wants. She's an actress for a soft-core filmmaker and soon is the most popular courtesan in Paris, parlaying this into a house, bought for her by a wealthy banker. She tosses him and takes up with her neighbor, a count of impeccable rectitude, and with the count's impressionable son. The count is soon fetching sticks like a dog and mortgaging his lands to satisfy her whims.1.276602Cannon GroupNaN1983-06-13092.0NaNReleasedNaNNana, the True Key of Pleasure4.73.019830.0Katya BergerMarc Behm4